AITopics | regret value

Collaborating Authors

regret value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

Xu, Luyue, Wang, Liming, Xie, Hong, Zhou, Mingqiang

arXiv.org Artificial IntelligenceAug-28-2024

Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning speed. Extensive experiments on datasets demonstrate that TS-Conf outperforms four benchmark algorithms. Analysis reveals that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.

algorithm, bandit, contextual bandit, (12 more...)

arXiv.org Artificial Intelligence

2408.14432

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)
Information Technology > Data Science (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

NNCFR: Minimize Counterfactual Regret with Neural Networks

Li, Huale, Wang, Xuan, Guo, Zengyue, Zhang, Jiajia, Qi, Shuhan

arXiv.org Artificial IntelligenceMay-26-2021

Counterfactual Regret Minimization (CFR)} is the popular method for finding approximate Nash equilibrium in two-player zero-sum games with imperfect information. CFR solves games by travsersing the full game tree iteratively, which limits its scalability in larger games. When applying CFR to solve large-scale games in previously, large-scale games are abstracted into small-scale games firstly. Secondly, CFR is used to solve the abstract game. And finally, the solution strategy is mapped back to the original large-scale game. However, this process requires considerable expert knowledge, and the accuracy of abstraction is closely related to expert knowledge. In addition, the abstraction also loses certain information, which will eventually affect the accuracy of the solution strategy. Towards this problem, a recent method, \textit{Deep CFR} alleviates the need for abstraction and expert knowledge by applying deep neural networks directly to CFR in full games. In this paper, we introduces \textit{Neural Network Counterfactual Regret Minimization (NNCFR)}, an improved variant of \textit{Deep CFR} that has a faster convergence by constructing a dueling netwok as the value network. Moreover, an evaluation module is designed by combining the value network and Monte Carlo, which reduces the approximation error of the value network. In addition, a new loss function is designed in the procedure of training policy network in the proposed \textit{NNCFR}, which can be good to make the policy network more stable. The extensive experimental tests are conducted to show that the \textit{NNCFR} converges faster and performs more stable than \textit{Deep CFR}, and outperforms \textit{Deep CFR} with respect to exploitability and head-to-head performance on test games.

cfr, information, value network, (16 more...)

arXiv.org Artificial Intelligence

2105.12328

Country:

North America > United States > Texas (0.05)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

Li, Shuxin, Zhang, Youzhi, Wang, Xinrun, Xue, Wanqi, An, Bo

arXiv.org Artificial IntelligenceMay-18-2021

In many real-world scenarios, a team of agents coordinate with each other to compete against an opponent. The challenge of solving this type of game is that the team's joint action space grows exponentially with the number of agents, which results in the inefficiency of the existing algorithms, e.g., Counterfactual Regret Minimization (CFR). To address this problem, we propose a new framework of CFR: CFR-MIX. Firstly, we propose a new strategy representation that represents a joint action strategy using individual strategies of all agents and a consistency relationship to maintain the cooperation between agents. To compute the equilibrium with individual strategies under the CFR framework, we transform the consistency relationship between strategies to the consistency relationship between the cumulative regret values. Furthermore, we propose a novel decomposition method over cumulative regret values to guarantee the consistency relationship between the cumulative regret values. Finally, we introduce our new algorithm CFR-MIX which employs a mixing layer to estimate cumulative regret values of joint actions as a non-linear combination of cumulative regret values of individual actions. Experimental results show that CFR-MIX outperforms existing algorithms on various games significantly.

cumulative regret value, regret value, strategy representation, (13 more...)

arXiv.org Artificial Intelligence

2105.0844

Country:

Asia > Singapore (0.14)
North America > Canada > Alberta (0.14)
North America > United States > Texas (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Algorithms for Average Regret Minimization

Storandt, Sabine (University of Konstanz) | Funke, Stefan (University of Stuttgart)

AAAI ConferencesJul-11-2019

In this paper, we study a problem from the realm of multi-criteria decision making in which the goal is to select from a given set S of d-dimensional objects a minimum sized subset S' with bounded regret. Thereby, regret measures the unhappiness of users which would like to select their favorite object from set S but now can only select their favorite object from the subset S'. Previous work focused on bounding the maximum regret which is determined by the most unhappy user. We propose to consider the average regret instead which is determined by the sum of (un)happiness of all possible users. We show that this regret measure comes with desirable properties as supermodularity which allows to construct approximation algorithms. Furthermore, we introduce the regret minimizing permutation problem and discuss extensions of our algorithms to the recently proposed k-regret measure. Our theoretical results are accompanied with experiments on a variety of inputs with d up to 7.

algorithm, proceedings, subset, (11 more...)

AAAI Conferences

Twelfth Annual Symposium on Combinatorial Search

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
North America > United States > Hawaii (0.05)

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

The Impatient May Use Limited Optimism to Minimize Regret

Cadilhac, Michaël, Pérez, Guillermo A., Bogaard, Marie van den

arXiv.org Artificial IntelligenceNov-17-2018

Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may regret her actions, realizing that a previous choice was suboptimal given the behavior of the environment. The main contribution of this paper is a PSPACE algorithm for computing the minimum possible regret of a given game. To this end, several results of independent interest are shown. (1) We identify a class of regret-minimizing and admissible strategies that first assume that the environment is collaborating, then assume it is adversarial---the precise timing of the switch is key here. (2) Disregarding the computational cost of numerical analysis, we provide an NP algorithm that checks that the regret entailed by a given time-switching strategy exceeds a given value. (3) We show that determining whether a strategy minimizes regret is decidable in PSPACE.

machine learning, reinforcement learning, val, (17 more...)

arXiv.org Artificial Intelligence

1811.07146

Country: Europe > Czechia (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

Ahmed, Asrar, Varakantham, Pradeep, Lowalekar, Meghna, Adulyasak, Yossiri, Jaillet, Patrick

Journal of Artificial Intelligence ResearchJul-1-2017

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.

algorithm, equation, regret value, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.5242

AI Access Foundation

11066

Journal of Artificial Intelligence Research

Country:

Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Indiana (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback