AITopics | opponent player

Collaborating Authors

opponent player

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models

Wang, Yibo, Chen, Qing-Guo, Xu, Zhao, Luo, Weihua, Zhang, Kaifu, Zhang, Lijun

arXiv.org Artificial IntelligenceDec-9-2025

Self-play fine-tuning has demonstrated promising abilities in adapting large language models (LLMs) to downstream tasks with limited real-world data. The basic principle is to iteratively refine the model with real samples and synthetic ones generated from itself. However, the existing methods primarily focus on the relative gaps between the rewards for two types of data, neglecting their absolute values. Through theoretical analysis, we identify that the gap-based methods suffer from unstable evolution, due to the potentially degenerated objectives. To address this limitation, we introduce a novel self-play fine-tuning method, namely Self-PlAy via Noise Contrastive Estimation (SPACE), which leverages noise contrastive estimation to capture the real-world data distribution. Specifically, SPACE treats synthetic samples as auxiliary components, and discriminates them from the real ones in a binary classification manner. As a result, SPACE independently optimizes the absolute reward values for each type of data, ensuring a consistently meaningful objective and thereby avoiding the instability issue. Theoretically, we show that the optimal solution of the objective in SPACE aligns with the underlying distribution of real-world data, and SPACE guarantees a provably stable convergence to the optimal distribution. Empirically, we show that SPACE significantly improves the performance of LLMs over various tasks, and outperforms supervised fine-tuning that employs much more real-world samples. Compared to gap-based self-play fine-tuning methods, SPACE exhibits remarkable superiority and stable evolution.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.07175

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Yuan, Huizhuo, Chen, Zixiang, Ji, Kaixuan, Gu, Quanquan

arXiv.org Artificial IntelligenceFeb-15-2024

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

diffusion model, diffusion-dpo, spin-diffusion, (15 more...)

arXiv.org Artificial Intelligence

2402.1021

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Denoising Opponents Position in Partial Observation Environment

Sayareh, Aref, Sardari, Aria, Khoddami, Vahid, Zare, Nader, da Fonseca, Vinicius Prado, Soares, Amilcar

arXiv.org Artificial IntelligenceOct-23-2023

The RoboCup competitions hold various leagues, and the Soccer Simulation 2D League is a major among them. Soccer Simulation 2D (SS2D) match involves two teams, including 11 players and a coach for each team, competing against each other. The players can only communicate with the Soccer Simulation Server during the game. Several code bases are released publicly to simplify team development. So researchers can easily focus on decision-making and implementing machine learning methods. SS2D actions and behaviors are only partially accurate due to different challenges, such as noise and partial observation. Therefore, one strategy is to implement alternative denoising methods to tackle observation inaccuracy. Our idea is to predict opponent positions while they have yet to be seen in a finite number of cycles using machine learning methods to make more accurate actions such as pass. We will explain our position prediction idea powered by Long Short-Term Memory models (LSTM) and Deep Neural Networks (DNN). The results show that the LSTM and DNN predict the opponents' position more accurately than the standard algorithm, such as the last-seen method.

architecture, po count, prediction, (14 more...)

arXiv.org Artificial Intelligence

2310.14553

Country:

North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
North America > Canada > Newfoundland and Labrador > Newfoundland > St. John's (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Game Theory Meets AI and NLP

#artificialintelligenceMar-7-2022, 11:51:56 GMT

Before going further, you'll need to understand the concept of game theory. Game theory is basically a branch of applied mathematics. In-game theories (How Game Theory Strategy Improves Decision Making), there are different available tools with the help of which different situations are analyzed. There are parties in-game theories mostly referred to as players and the decision they have taken are interdependent. This is a kind of playing chess in which the turn of one player is associated with the future strategy of the opponent player.

ai agent, game theory, opponent player, (9 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.56)

Add feedback

Cyrus 2D Simulation Team Description Paper 2016

Zare, Nader, Keshavarzi, Ashkan, Beheshtian, Seyed Ehsan, Mowla, Hadi, Akbarpour, Aryan, Jafari, Hossein, Baraghi, Keyvan Arab, Zarifi, Mohammad Amin, Javidan, Reza

arXiv.org Artificial IntelligenceFeb-8-2022

This description includes some explanation about algorithms and also algorithms that are being implemented by Cyrus team members. The objectives of this description are to express a brief explanation about shoot, block, mark and defensive decision will be given. It also explained about the parts that has been implemented. The base code that Cyrus used is agent 3.11.

algorithm, neural network, opponent, (13 more...)

arXiv.org Artificial Intelligence

2202.03726

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.06)
South America > Brazil > Paraíba > João Pessoa (0.05)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.05)
(3 more...)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Soccer (0.76)

Technology:

Information Technology > Artificial Intelligence > Robots (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback