AITopics | Dimitrakakis, Christos

Collaborating Authors

Dimitrakakis, Christos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fair Set Selection: Meritocracy and Social Welfare

Buening, Thomas Kleine, Segal, Meirav, Basu, Debabrota, Dimitrakakis, Christos

arXiv.org Artificial IntelligenceFeb-23-2021

In this paper, we formulate the problem of selecting a set of individuals from a candidate population as a utility maximisation problem. From the decision maker's perspective, it is equivalent to finding a selection policy that maximises expected utility. Our framework leads to the notion of expected marginal contribution (EMC) of an individual with respect to a selection policy as a measure of deviation from meritocracy. In order to solve the maximisation problem, we propose to use a policy gradient algorithm. For certain policy structures, the policy gradients are proportional to EMCs of individuals. Consequently, the policy gradient algorithm leads to a locally optimal solution that has zero EMC, and satisfies meritocracy. For uniform policies, EMC reduces to the Shapley value. EMC also generalises the fair selection properties of Shapley value for general selection policies. We experimentally analyse the effect of different policy structures in a simulated college admission setting and compare with ranking and greedy algorithms. Our results verify that separable linear policies achieve high utility while minimising EMCs. We also show that we can design utility functions that successfully promote notions of group fairness, such as diversity.

contribution, educational setting, game theory, (21 more...)

arXiv.org Artificial Intelligence

2102.11932

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > Higher Education (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

Tossou, Aristide, Dimitrakakis, Christos, Basu, Debabrota

arXiv.org Artificial IntelligenceJul-9-2019

We tackle the problem of acting in an unknown finite and discrete Markov Decision Process (MDP) for which the expected shortest path from any state to any other state is bounded by a finite number $D$. An MDP consists of $S$ states and $A$ possible actions per state. Upon choosing an action $a_t$ at state $s_t$, one receives a real value reward $r_t$, then one transits to a next state $s_{t+1}$. The reward $r_t$ is generated from a fixed reward distribution depending only on $(s_t, a_t)$ and similarly, the next state $s_{t+1}$ is generated from a fixed transition distribution depending only on $(s_t, a_t)$. The objective is to maximize the accumulated rewards after $T$ interactions. In this paper, we consider the case where the reward distributions, the transitions, $T$ and $D$ are all unknown. We derive the first polynomial time Bayesian algorithm, BUCRL{} that achieves up to logarithm factors, a regret (i.e the difference between the accumulated rewards of the optimal policy and our algorithm) of the optimal order $\tilde{\mathcal{O}}(\sqrt{DSAT})$. Importantly, our result holds with high probability for the worst-case (frequentist) regret and not the weaker notion of Bayesian regret. We perform experiments in a variety of environments that demonstrate the superiority of our algorithm over previous techniques. Our work also illustrates several results that will be of independent interest. In particular, we derive a sharper upper bound for the KL-divergence of Bernoulli random variables. We also derive sharper upper and lower bounds for Beta and Binomial quantiles. All the bound are very simple and only use elementary functions.

artificial intelligence, machine learning, quantile, (18 more...)

arXiv.org Artificial Intelligence

1906.09114

Country:

Oceania > Australia (0.14)
Europe > Sweden (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Epistemic Risk-Sensitive Reinforcement Learning

Eriksson, Hannes, Dimitrakakis, Christos

arXiv.org Artificial IntelligenceJun-14-2019

We develop a framework for interacting with uncertain environments in reinforcement learning (RL) by leveraging preferences in the form of utility functions. We claim that there is value in considering different risk measures during learning. In this framework, the preference for risk can be tuned by variation of the parameter $\beta$ and the resulting behavior can be risk-averse, risk-neutral or risk-taking depending on the parameter choice. We evaluate our framework for learning problems with model uncertainty. We measure and control for \emph{epistemic} risk using dynamic programming (DP) and policy gradient-based algorithms. The risk-averse behavior is then compared with the behavior of the optimal risk-neutral policy in environments with epistemic risk.

algorithm, artificial intelligence, survey article, (18 more...)

arXiv.org Artificial Intelligence

1906.06273

Country: Europe > Sweden (0.14)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

Tossou, Aristide, Basu, Debabrota, Dimitrakakis, Christos

arXiv.org Artificial IntelligenceMay-27-2019

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

artificial intelligence, mdp, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1905.12425

Country:

North America > United States (0.18)
Europe > Sweden (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Randomised Bayesian Least-Squares Policy Iteration

Tziortziotis, Nikolaos, Dimitrakakis, Christos, Vazirgiannis, Michalis

arXiv.org Artificial IntelligenceApr-6-2019

We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised BLSPI (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, the exploration-exploitation dilemma should be addressed as we try to discover the optimal policy by using samples collected by ourselves. RBLSPI exploits the advantage of BLSTD to quantify our uncertainty about the value function. Inspired by Thompson sampling, RBLSPI first samples a value function from a posterior distribution over value functions, and then selects actions based on the sampled value function. The effectiveness and the exploration abilities of RBLSPI are demonstrated experimentally in several environments.

algorithm, bayesian inference, upstream oil & gas, (20 more...)

arXiv.org Artificial Intelligence

1904.03535

Country:

Europe > France (0.14)
Europe > Norway (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Deeper & Sparser Exploration

Grover, Divya, Dimitrakakis, Christos

arXiv.org Machine LearningFeb-7-2019

We address the problem of efficient exploration by proposing a new meta algorithm in the context of model-based online planning for Bayesian Reinforcement Learning (BRL). We beat the state-of-the-art, while staying computationally faster, in some cases by two orders of magnitude. This is the first Optimism free BRL algorithm to beat all previous state-of-the-art in tabular RL. The main novelty is the use of a candidate policy generator, to generate long-term options in the belief tree, which allows us to create much sparser and deeper trees. We present results on many standard environments and empirically prove its performance.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1902.02661

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

On The Differential Privacy of Thompson Sampling With Gaussian Prior

Tossou, Aristide C. Y., Dimitrakakis, Christos

arXiv.org Artificial IntelligenceJun-24-2018

We show that Thompson Sampling with Gaussian Prior as detailed by Algorithm 2 in (Agrawal & Goyal, 2013) is already differentially private. Theorem 1 show that it enjoys a very competitive privacy loss of only $\mathcal{O}(\ln^2 T)$ after T rounds. Finally, Theorem 2 show that one can control the privacy loss to any desirable $\epsilon$ level by appropriately increasing the variance of the samples from the Gaussian posterior. And this increases the regret only by a term of $\mathcal{O}(\frac{\ln^2 T}{\epsilon})$. This compares favorably to the previous result for Thompson Sampling in the literature ((Mishra & Thakurta, 2015)) which adds a term of $\mathcal{O}(\frac{K \ln^3 T}{\epsilon^2})$ to the regret in order to achieve the same privacy level. Furthermore, our result use the basic Thompson Sampling with few modifications whereas the result of (Mishra & Thakurta, 2015) required sophisticated constructions.

artificial intelligence, machine learning, thompson, (15 more...)

arXiv.org Artificial Intelligence

1806.09192

Country: Europe > Sweden (0.16)

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Nearly optimal exploration-exploitation decision thresholds

Dimitrakakis, Christos

arXiv.org Artificial IntelligenceJun-4-2018

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. In this paper, we first derive upper bounds for the utility of selecting different actions in the multi-armed bandit setting. Unlike the common statistical upper confidence bounds, these explicitly link the planning horizon, uncertainty and the need for exploration explicit. The resulting algorithm can be seen as a generalisation of the classical Thompson sampling algorithm. We experimentally test these algorithms, as well as $\epsilon$-greedy and the value of perfect information heuristics. Finally, we also introduce the idea of bagging for reinforcement learning. By employing a version of online bootstrapping, we can efficiently sample from an approximate posterior distribution.

artificial intelligence, exploration, upstream oil & gas, (17 more...)

arXiv.org Artificial Intelligence

cs/0604010

Country:

North America > United States > Virginia (0.14)
North America > Canada (0.14)
Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Data Science > Data Mining > Big Data (0.51)

Add feedback

Multi-View Decision Processes: The Helper-AI Problem

Dimitrakakis, Christos, Parkes, David C., Radanovic, Goran, Tylkin, Paul

Neural Information Processing SystemsDec-31-2017

We consider a two-player sequential game in which agents have the same reward function but may disagree on the transition probabilities of an underlying Markovian model of the world. By committing to play a specific policy, the agent with the correct model can steer the behavior of the other agent, and seek to improve utility. We model this setting as a multi-view decision process, which we use to formally analyze the positive effect of steering policies. Furthermore, we develop an algorithm for computing the agents' achievable joint policy, and we experimentally show that it can lead to a large utility increase when the agents' models diverge.

agent, artificial intelligence, game theory, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

Subjective fairness: Fairness is in the eye of the beholder

Dimitrakakis, Christos, Liu, Yang, Parkes, David, Radanovic, Goran

arXiv.org Machine LearningMay-31-2017

We analyze different notions of fairness in decision making when the underlying model is not known with certainty. We argue that recent notions of fairness in machine learning need to be modified to incorporate uncertainties about model parameters. We introduce the notion of {\em subjective fairness} as a suitable candidate for fair Bayesian decision making rules, relate this definition with existing ones, and experimentally demonstrate the inherent accuracy-fairness tradeoff under this definition.

artificial intelligence, decision rule, machine learning, (16 more...)

arXiv.org Machine Learning

1706.00119

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback