AITopics | observable game

Collaborating Authors

observable game

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 22:01:28 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors give an algorithm for easy partial-monitoring games, ones that satisfy the local observability condition of Bartok et al. Their algorithm BPM attains the O(\sqrt{T}) rate which is minimax optimal for such games. Originality and Significance: There are already algorithms that attain O(\sqrt{T}) regret for easy partial monitoring games. Indeed, the authors compare themselves against the CBP algorithm of Bartok et al.

algorithm, experiment, outcome distribution, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

Tsuchiya, Taira, Ito, Shinji, Honda, Junya

arXiv.org Machine LearningFeb-13-2024

Partial monitoring is a generic framework of online decision-making problems with limited observations. To make decisions from such limited observations, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, exploration by optimization (ExO), was proposed, which achieves the optimal bounds in adversarial environments with follow-the-regularized-leader for a wide range of online decision-making problems. However, a naive application of ExO in stochastic environments significantly degrades regret bounds. To resolve this problem in locally observable games, we first establish a novel framework and analysis for ExO with a hybrid regularizer. This development allows us to significantly improve the existing regret bounds of best-of-both-worlds (BOBW) algorithms, which achieves nearly optimal bounds both in stochastic and adversarial environments. In particular, we derive a stochastic regret bound of $O(\sum_{a \neq a^*} k^2 m^2 \log T / \Delta_a)$, where $k$, $m$, and $T$ are the numbers of actions, observations and rounds, $a^*$ is an optimal action, and $\Delta_a$ is the suboptimality gap for action $a$. This bound is roughly $\Theta(k^2 \log T)$ times smaller than existing BOBW bounds. In addition, for globally observable games, we provide a new BOBW algorithm with the first $O(\log T)$ stochastic bound.

adversarial environment, algorithm, stochastic environment, (16 more...)

arXiv.org Machine Learning

2402.08321

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Best-of-Both-Worlds Algorithms for Partial Monitoring

Tsuchiya, Taira, Ito, Shinji, Honda, Junya

arXiv.org Artificial IntelligenceOct-9-2022

This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 \log(T) \log(k_{\Pi} T) / \Delta_{\min})$ in the stochastic regime and $O(m k^{2/3} \sqrt{T \log(T) \log k_{\Pi}})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\Delta_{\min}$ is the minimum suboptimality gap, and $k_{\Pi}$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)$ in the stochastic regime and $O((c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T))^{1/3} T^{2/3})$ in the adversarial regime, where $c_{\mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.

data mining, machine learning, regime, (16 more...)

arXiv.org Artificial Intelligence

2207.1455

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback

An Approach to Partial Observability in Games: Learning to Both Act and Observe

Gilmour, Elizabeth, Plotkin, Noah, Smith, Leslie

arXiv.org Artificial IntelligenceAug-11-2021

Reinforcement learning (RL) is successful at learning to play games where the entire environment is visible. However, RL approaches are challenged in complex games like Starcraft II and in real-world environments where the entire environment is not visible. In these more complex games with more limited visual information, agents must choose where to look and how to optimally use their limited visual information in order to succeed at the game. We verify that with a relatively simple model the agent can learn where to look in scenarios with a limited visual bandwidth. We develop a method for masking part of the environment in Atari games to force the RL agent to learn both where to look and how to play the game in order to study where the RL agent learns to look. In addition, we develop a neural network architecture and method for allowing the agent to choose where to look and what action to take in the Pong game. Further, we analyze the strategies the agent learns to better understand how the RL agent learns to play the game.

agent, information, visual information, (10 more...)

arXiv.org Artificial Intelligence

2108.05701

Country: North America > United States > District of Columbia > Washington (0.05)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Information Directed Sampling for Linear Partial Monitoring

Kirschner, Johannes, Lattimore, Tor, Krause, Andreas

arXiv.org Machine LearningFeb-25-2020

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.

bandit, information directed sampling, observable game, (12 more...)

arXiv.org Machine Learning

2002.11182

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Partially Observable Games for Secure Autonomy

Ahmadi, Mohamadreza, Viswanathan, Arun A., Ingham, Michel D., Tan, Kymie, Ames, Aaron D.

arXiv.org Artificial IntelligenceFeb-5-2020

Technology development efforts in autonomy and cyber-defense have been evolving independently of each other, over the past decade. In this paper, we report our ongoing effort to integrate these two presently distinct areas into a single framework. To this end, we propose the two-player partially observable stochastic game formalism to capture both high-level autonomous mission planning under uncertainty and adversarial decision making subject to imperfect information. We show that synthesizing sub-optimal strategies for such games is possible under finite-memory assumptions for both the autonomous decision maker and the cyber-adversary. We then describe an experimental testbed to evaluate the efficacy of the proposed framework.

artificial intelligence, machine learning, posg, (18 more...)

arXiv.org Artificial Intelligence

2002.01969

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Exploration by Optimisation in Partial Monitoring

Lattimore, Tor, Szepesvari, Csaba

arXiv.org Machine LearningJul-12-2019

We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring games for which the $n$-round minimax regret is bounded by $3(d+1) k^{3/2} \sqrt{8n \log(k)}$, matching the best known information-theoretic upper bounds.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1907.05772

Country:

North America > United States (0.68)
Europe (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

Lattimore, Tor, Szepesvari, Csaba

arXiv.org Machine LearningMay-23-2018

Csaba Szepesvári DeepMind Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner. We complete the classification of finite adversarial partial monitoring to include all games, solving an open problem posed by Bartók et al. [2014]. Along the way we simplify and improve existing algorithms and correct errors in previous analyses. Our second contribution is a new algorithm for the class of games studied by Bartók [2013] where we prove upper and lower regret bounds that shed more light on the dependence of the regret on the game structure.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1805.09247

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

An Adaptive Algorithm for Finite Stochastic Partial Monitoring

Bartok, Gabor, Zolghadr, Navid, Szepesvari, Csaba

arXiv.org Machine LearningJun-27-2012

We present a new anytime algorithm that achieves near-optimal regret for any instance of finite stochastic partial monitoring. In particular, the new algorithm achieves the minimax regret, within logarithmic factors, for both "easy" and "hard" problems. For easy problems, it additionally achieves logarithmic individual regret. Most importantly, the algorithm is adaptive in the sense that if the opponent strategy is in an "easy region" of the strategy space then the regret grows as if the problem was easy. As an implication, we show that under some reasonable additional assumptions, the algorithm enjoys an O(\sqrt{T}) regret in Dynamic Pricing, proven to be hard by Bartok et al. (2011).

algorithm, artificial intelligence, opponent strategy, (13 more...)

arXiv.org Machine Learning

1206.6487

Country: North America > Canada (0.46)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback