AITopics | katehakis

Collaborating Authors

katehakis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Cowan, Wesley, Katehakis, Michael N., Pirutinsky, Daniel

arXiv.org Artificial IntelligenceSep-28-2019

In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP). In addition, we develop a similar acceleration for computing the indices for the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED) algorithm developed in Cowan et al. (2019), based on ideas from Honda and Takemura (2011), that involves solving a single equation of one variable. We provide experimental results demonstrating the computational time savings and regret performance of these algorithms. In these comparison we also consider the Optimistic Linear Programming (OLP) algorithm (Tewari and Bartlett, 2008) and a method based on Posterior sampling (MDP-PS).

algorithm, katehakis, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

1909.13158

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.74)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

Cowan, Wesley, Katehakis, Michael N., Pirutinsky, Daniel

arXiv.org Artificial IntelligenceSep-12-2019

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of \cc{bkmdp97} with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

1909.06019

Country: North America > United States (0.68)

Genre:

Overview (0.89)
Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Optimal Data Driven Resource Allocation under Multi-Armed Bandit Observations

Burnetas, Apostolos N., Kanavetas, Odysseas, Katehakis, Michael N.

arXiv.org Machine LearningDec-13-2018

Consider the problem of sequentially activating one of a finite number of independent bandits, where each activation of a bandit incurs a number of bandit dependent resource utilizations, or activation costs. For each resource type the constraint insures that the total resource utilized (or equivalently cost incurred) at any time does not exceed the current resource availability (budget). It is assumed that following each activation any unused resource amounts can be carried forward for use in future activations. We also make the assumption that successive activations of each bandit yield independent, among different bandits, identically distributed (iid) random rewards with positive means, and distributions that depend on unknown parameters. The objective isto obtain a feasible policy that maximizes asymptotically the total expect rewards or equivalently, minimizes asymptotically a regret function. We develop a class of feasible policies that are shown to be asymptotically optimal within a large class of good policies that uniformly fast (UF) convergent, in the sense of Burnetas and Katehakis (1996) and Lai and Robbins (1985). The results in this paper extend the work in Burnetas et al. (2017) which solved the case where there exists only one type of constraint for all bandits.

artificial intelligence, bandit, big data, (20 more...)

arXiv.org Machine Learning

1811.12852

Country:

North America > United States (0.46)
Europe > Greece (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas (0.66)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

Cowan, Wesley, Katehakis, Michael N.

arXiv.org Machine LearningDec-17-2015

We consider the \mnk{classical} problem of a controller activating (or sampling) sequentially from a finite number of $N \geq 2$ populations, specified by unknown distributions. Over some time horizon, at each time $n = 1, 2, \ldots$, the controller wishes to select a population to sample, with the goal of sampling from a population that optimizes some "score" function of its distribution, e.g., maximizing the expected sum of outcomes or minimizing variability. We define a class of \textit{Uniformly Fast (UF)} sampling policies and show, under mild regularity conditions, that there is an asymptotic lower bound for the expected total number of sub-optimal population activations. Then, we provide sufficient conditions under which a UCB policy is UF and asymptotically optimal, since it attains this lower bound. Explicit solutions are provided for a number of examples of interest, including general score functionals on unconstrained Pareto distributions (of potentially infinite mean), and uniform distributions of unknown support. Additional results on bandits of Normal distributions are also provided.

bandit, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

1510.02041

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > West Sussex (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

Burnetas, Apostolos N., Kanavetas, Odysseas, Katehakis, Michael N.

arXiv.org Machine LearningDec-17-2015

We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to design a feasible policy for deciding from which population to sample from, so as to maximize the expected sum of outcomes of $n$ total samples or equivalently to minimize the regret due to lack on information on sample distributions, For this problem we consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy the cost constraint sample-path wise. We first establish a necessary asymptotic lower bound for the rate of increase of the regret function of f-UF policies. Then we construct a class of f-UF policies and provide conditions under which they are asymptotically optimal within the class of f-UF policies, achieving this asymptotic lower bound. At the end we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1509.02857

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > United Kingdom > England > West Sussex (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Inventory Control Involving Unknown Demand of Discrete Nonperishable Items - Analysis of a Newsvendor-based Policy

Katehakis, Michael N., Yang, Jian, Zhou, Tingting

arXiv.org Machine LearningOct-21-2015

Inventory control with unknown demand distribution is considered, with emphasis placed on the case involving discrete nonperishable items. We focus on an adaptive policy which in every period uses, as much as possible, the optimal newsvendor ordering quantity for the empirical distribution learned up to that period. The policy is assessed using the regret criterion, which measures the price paid for ambiguity on demand distribution over $T$ periods. When there are guarantees on the latter's separation from the critical newsvendor parameter $\beta=b/(h+b)$, a constant upper bound on regret can be found. Without any prior information on the demand distribution, we show that the regret does not grow faster than the rate $T^{1/2+\epsilon}$ for any $\epsilon>0$. In view of a known lower bound, this is almost the best one could hope for. Simulation studies involving this along with other policies are also conducted.

artificial intelligence, demand distribution, machine learning, (16 more...)

arXiv.org Machine Learning

1510.06463

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Essex County > Newark (0.04)
Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Cowan, Wesley, Katehakis, Michael N.

arXiv.org Machine LearningSep-24-2015

Consider the problem of a controller sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled. It is assumed that for each fixed $i$, $\{ X^i_k \}_{k \geq 1}$ is a sequence of i.i.d. uniform random variables over some interval $[a_i, b_i]$, with the support (i.e., $a_i, b_i$) unknown to the controller. The objective is to have a policy $\pi$ for deciding, based on available data, from which of the $N$ populations to sample from at any time $n=1,2,\ldots$ so as to maximize the expected sum of outcomes of $n$ samples or equivalently to minimize the regret due to lack on information of the parameters $\{ a_i \}$ and $\{ b_i \}$. In this paper, we present a simple inflated sample mean (ISM) type policy that is asymptotically optimal in the sense of its regret achieving the asymptotic lower bound of Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

artificial intelligence, katehakis, machine learning, (14 more...)

arXiv.org Machine Learning

1505.01918

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > United Kingdom > England > West Sussex (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Cowan, Wesley, Honda, Junya, Katehakis, Michael N.

arXiv.org Machine LearningJun-2-2015

Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled. It is assumed that for each fixed $i$, $\{ X^i_k \}_{k \geq 1}$ is a sequence of i.i.d. normal random variables, with unknown mean $\mu_i$ and unknown variance $\sigma_i^2$. The objective is to have a policy $\pi$ for deciding from which of the $N$ populations to sample form at any time $n=1,2,\ldots$ so as to maximize the expected sum of outcomes of $n$ samples or equivalently to minimize the regret due to lack on information of the parameters $\mu_i$ and $\sigma_i^2$. In this paper, we present a simple inflated sample mean (ISM) index policy that is asymptotically optimal in the sense of Theorem 4 below. This resolves a standing open problem from Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1504.05823

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > West Sussex (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback