AITopics | Zheng, Weiqiang

Collaborating Authors

Zheng, Weiqiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences

Liu, Yixin, Oikonomou, Argyris, Zheng, Weiqiang, Cai, Yang, Cohan, Arman

arXiv.org Artificial IntelligenceOct-30-2024

Many alignment methods, including reinforcement learning from human feedback (RLHF), rely on the Bradley-Terry reward assumption, which is insufficient to capture the full range of general human preferences. To achieve robust alignment with general preferences, we model the alignment problem as a two-player zero-sum game, where the Nash equilibrium policy guarantees a 50% win rate against any competing policy. However, previous algorithms for finding the Nash policy either diverge or converge to a Nash policy in a modified game, even in a simple synthetic setting, thereby failing to maintain the 50% win rate guarantee against all other policies. We propose a meta-algorithm, Convergent Meta Alignment Algorithm (COMAL), for language model alignment with general preferences, inspired by convergent algorithms in game theory. Theoretically, we prove that our meta-algorithm converges to an exact Nash policy in the last iterate. Additionally, our meta-algorithm is simple and can be integrated with many existing methods designed for RLHF and preference optimization with minimal changes. Experimental results demonstrate the effectiveness of the proposed framework when combined with existing preference policy optimization methods.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.23223

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Cai, Yang, Farina, Gabriele, Grand-Clément, Julien, Kroer, Christian, Lee, Chung-Wei, Luo, Haipeng, Zheng, Weiqiang

arXiv.org Artificial IntelligenceJun-15-2024

Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $\delta>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/\delta$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.

artificial intelligence, game theory, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.10631

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Cai, Yang, Luo, Haipeng, Wei, Chen-Yu, Zheng, Weiqiang

arXiv.org Artificial IntelligenceJan-26-2024

We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2401.1524

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Game Theory (0.93)

Add feedback

Learning Thresholds with Latent Values and Censored Feedback

Zhang, Jiahao, Lin, Tao, Zheng, Weiqiang, Feng, Zhe, Teng, Yifeng, Deng, Xiaotie

arXiv.org Artificial IntelligenceDec-7-2023

In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(\gamma, v)$ depends on the proposed threshold $\gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value. This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online task assignments in crowdsourcing, setting recruiting bars in hiring, etc. We first characterize the query complexity of learning a threshold with the expected reward at most $\epsilon$ smaller than the optimum and prove that the number of queries needed can be infinitely large even when $g(\gamma, v)$ is monotone with respect to both $\gamma$ and $v$. On the positive side, we provide a tight query complexity $\tilde{\Theta}(1/\epsilon^3)$ when $g$ is monotone and the CDF of value distribution is Lipschitz. Moreover, we show a tight $\tilde{\Theta}(1/\epsilon^3)$ query complexity can be achieved as long as $g$ satisfies one-sided Lipschitzness, which provides a complete characterization for this problem. Finally, we extend this model to an online learning setting and demonstrate a tight $\Theta(T^{2/3})$ regret bound using continuous-arm bandit techniques and the aforementioned query complexity results.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2312.04653

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Law > Civil Rights & Constitutional Law (0.50)
Education (0.50)
Information Technology > Services (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Add feedback

Doubly Optimal No-Regret Learning in Monotone Games

Cai, Yang, Zheng, Weiqiang

arXiv.org Artificial IntelligenceSep-4-2023

We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate to a Nash equilibrium. While the $O(\frac{1}{\sqrt{T}})$ rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal $O(\sqrt{T})$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(\frac{1}{T})$ last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an $O(\log T)$ individual worst-case dynamic regret, providing an exponential improvement over the previous state-of-the-art $O(\sqrt{T})$ bound.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.1312

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Accelerated Single-Call Methods for Constrained Min-Max Optimization

Cai, Yang, Zheng, Weiqiang

arXiv.org Artificial IntelligenceMay-14-2023

We study first-order methods for constrained min-max optimization. Existing methods either require two gradient calls or two projections in each iteration, which may be costly in some applications. In this paper, we first show that a variant of the Optimistic Gradient (OG) method, a single-call single-projection algorithm, has $O(\frac{1}{\sqrt{T}})$ best-iterate convergence rate for inclusion problems with operators that satisfy the weak Minty variation inequality (MVI). Our second result is the first single-call single-projection algorithm -- the Accelerated Reflected Gradient (ARG) method that achieves the optimal $O(\frac{1}{T})$ last-iterate convergence rate for inclusion problems that satisfy negative comonotonicity. Both the weak MVI and negative comonotonicity are well-studied assumptions and capture a rich set of non-convex non-concave min-max optimization problems. Finally, we show that the Reflected Gradient (RG) method, another single-call single-projection algorithm, has $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate for constrained convex-concave min-max optimization, answering an open problem of [Heish et al, 2019]. Our convergence rates hold for standard measures such as the tangent residual and the natural residual.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2210.03096

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Nash Convergence of Mean-Based Learning Algorithms in First Price Auctions

Deng, Xiaotie, Hu, Xinyan, Lin, Tao, Zheng, Weiqiang

arXiv.org Artificial IntelligenceOct-8-2021

A fundamental question in the field of Learning and Games is Nash convergence of online learning dynamics: if the players in a repeated game employ some online learning algorithms to adjust strategies, will their strategies converge to the Nash equilibrium of the game? Although the answer to this question is "no" in general (see Related Works for details), positive results do exist for some special cases of online learning algorithms and games: for example, no-regret learning algorithms provably converge to Nash equilibria in zero-sum games, 2 2 games, and routing games (see e.g., Fudenberg and Levine, 1998; Cesa-Bianchi and Lugosi, 2006; Nisan et al., 2007). In this work, we analyze Nash convergence of online learning dynamics in repeated auctions, where bidders learn to bid using online learning algorithms. Although auctions are of both theoretical and practical importance, little is known about their Nash convergence properties, even for the perhaps simplest and most popular auction, the single-item first-price sealed-bid auction (or first price auction for short). One of the obstacles to the theoretical analysis of Nash convergence in the first price auction is the lack of explicit characterization of its Nash equilibrium.

artificial intelligence, educational setting, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2110.03906

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)

Genre: Research Report > New Finding (0.45)

Industry: Education (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback