AITopics | exp4

Collaborating Authors

exp4

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

We study the problem of nonstochastic bandits with infinitely many experts: A learner aims to maximize the total reward by taking actions sequentially based on bandit feedback while benchmarking against a countably infinite set of experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings while preserving the order of the regret upper bound. We then incorporate the variant into a meta-algorithm that works on infinitely many experts. We prove a high-probability upper bound of $\tilde{\mathcal{O}} \big( i^*K + \sqrt{KT} \big)$ on the regret, up to polylog factors, where $i^*$ is the unknown position of the best expert, $K$ is the number of actions, and $T$ is the time horizon. We also provide an example of structured experts and discuss how to expedite learning in such case. Our meta-learning algorithm achieves the tightest regret upper bound for the setting considered when $i^* = \tilde{\mathcal{O}} \big( \sqrt{T/K} \big)$. If a prior distribution is assumed to exist for $i^*$, the probability of satisfying a tight regret bound increases with $T$, the rate of which can be fast.

exp4, nonstochastic bandit, probability, (12 more...)

arXiv.org Machine Learning

2102.05164

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Food & Agriculture > Agriculture (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

When Humans and Machines Make Joint Decisions: A Non-Symmetric Bandit Model

Bordt, Sebastian, von Luxburg, Ulrike

arXiv.org Machine LearningJul-9-2020

As machine learning algorithms have become on par or even superior to humans in a number of decision making problems (Rajpurkar et al., 2017; Silver et al., 2018), the idea that humans might be assisted by computer programs across a large variety of tasks has gained momentum. Already today, automated decision making systems are being used to predict cardiac arrests, credit scores, and recidivism (Tonekaboni et al., 2018; Board of Governors, 2007; Angwin et al., 2016). An emerging literature asks how humans, who often remain the final decision makers, can and should interact with such systems (Tonekaboni et al., 2019; Lucic et al., 2020). Machine learners, in turn, want to understand how algorithms should be designed such that interaction with humans is as fruitful as possible (Carroll et al., 2019). In most real-world decision making problems, humans have access to information that is unobservable to any algorithm. In the medical domain, doctors can obtain important information from personal interaction with patients (Goldenberg and Engelhardt, 2019). In judicial bail, judges may base their decision on the behavior of the defendant in the courtroom (Lakkaraju et al., 2017). From a machine learning point of view, such unobserved variables arise not due to a failure of the algorithm's designer to collect them. Instead, it is a property of many real-world decision problems that formulating all relevant aspects as inputs to an algorithm is impossible. 1

data mining, machine learning, player 2, (17 more...)

arXiv.org Machine Learning

2007.048

Country:

North America > United States (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Piecewise-Stationary Off-Policy Optimization

Hong, Joey, Kveton, Branislav, Zaheer, Manzil, Chow, Yinlam, Ahmed, Amr

arXiv.org Artificial IntelligenceJun-15-2020

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to state-of-the-art baselines on both synthetic and real-world datasets. Our approach outperforms methods that act only on observed context.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2006.08236

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

Vural, N. Mert, Gokcesu, Hakan, Gokcesu, Kaan, Kozat, Suleyman S.

arXiv.org Artificial IntelligenceNov-25-2019

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching $m$-arm strategy with minimax optimal regret bounds. To construct our algorithm, we introduce a new expert advice algorithm for the multiple-play setting. By using our expert advice algorithm, we additionally improve the best-known high-probability bound for the multi-play setting by $O(\sqrt{m})$. Our results are guaranteed to hold in an individual sequence manner since we have no statistical assumption on the bandit arm gains. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art algorithms.

algorithm, exp3, exp4, (15 more...)

arXiv.org Artificial Intelligence

1911.11122

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)

Add feedback

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

Krishnamurthy, Akshay, Langford, John, Slivkins, Aleksandrs, Zhang, Chicheng

arXiv.org Machine LearningFeb-4-2019

We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent "zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

algorithm, assumption, dimension, (17 more...)

arXiv.org Machine Learning

1902.0152

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Risk-Aware Algorithms for Adversarial Contextual Bandits

Sun, Wen, Dey, Debadeepta, Kapoor, Ashish

arXiv.org Machine LearningOct-17-2016

In this work we consider adversarial contextual bandits with risk constraints. At each round, nature prepares a context, a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and then receives the corresponding cost and risk associated with the pulled arm. In addition to minimizing the cumulative cost, the learner also needs to satisfy long-term risk constraints -- the average of the cumulative risk from all pulled arms should not be larger than a pre-defined threshold. To address this problem, we first study the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint. We develop a meta algorithm leveraging online mirror descent for the full information setting and extend it to contextual bandit with risk constraints setting using expert advice. Our algorithms can achieve near-optimal regret in terms of minimizing the total cost, while successfully maintaining a sublinear growth of cumulative risk constraint violation.

constraint, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1610.05129

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback