AITopics | Guan, Ziwei

Collaborating Authors

Guan, Ziwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback

Cai, Xufeng, Guan, Ziwei, Yuan, Lei, Aydin, Ali Selman, Xu, Tengyu, Liu, Boying, Ren, Wenbo, Xiang, Renkai, He, Songyi, Yang, Haichuan, Li, Serena, Gao, Mingze, Weng, Yue, Liu, Ji

arXiv.org Artificial IntelligenceJan-29-2025

Modern recommendation systems can be broadly divided into two key stages: the ranking stage, where the system predicts various user engagements (e.g., click-through rate, like rate, follow rate, watch time), and the value model stage, which aggregates these predictive scores through a function (e.g., a linear combination defined by a weight vector) to measure the value of each content by a single numerical score. Both stages play roughly equally important roles in real industrial systems; however, how to optimize the model weights for the second stage still lacks systematic study. This paper focuses on optimizing the second stage through auto-tuning technology. Although general auto-tuning systems and solutions - both from established production practices and open-source solutions - can address this problem, they typically require weeks or even months to identify a feasible solution. Such prolonged tuning processes are unacceptable in production environments for recommendation systems, as suboptimal value models can severely degrade user experience. An effective auto-tuning solution is required to identify a viable model within 2-3 days, rather than the extended timelines typically associated with existing approaches. In this paper, we introduce a practical auto-tuning system named HyperZero that addresses these time constraints while effectively solving the unique challenges inherent in modern recommendation systems. Moreover, this framework has the potential to be expanded to broader tuning tasks within recommendation systems.

artificial intelligence, hyperzero, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3690624.3709409

2501.18126

Country:

North America > United States > Washington > King County (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Primal-Dual Approach to Bilevel Optimization with Multiple Inner Minima

Sow, Daouda, Ji, Kaiyi, Guan, Ziwei, Liang, Yingbin

arXiv.org Machine LearningJun-8-2022

Bilevel optimization has found extensive applications in modern machine learning problems such as hyperparameter optimization, neural architecture search, meta-learning, etc. While bilevel problems with a unique inner minimal point (e.g., where the inner function is strongly convex) are well understood, such a problem with multiple inner minimal points remains to be challenging and open. Existing algorithms designed for such a problem were applicable to restricted situations and do not come with a full guarantee of convergence. In this paper, we adopt a reformulation of bilevel optimization to constrained optimization, and solve the problem via a primal-dual bilevel optimization (PDBO) algorithm. PDBO not only addresses the multiple inner minima challenge, but also features fully first-order efficiency without involving second-order Hessian and Jacobian computations, as opposed to most existing gradient-based bilevel algorithms. We further characterize the convergence rate of PDBO, which serves as the first known non-asymptotic convergence guarantee for bilevel optimization with multiple inner minima. Our experiments demonstrate desired performance of the proposed approach.

artificial intelligence, machine learning, multiple inner minima, (2 more...)

arXiv.org Machine Learning

2203.01123

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

Guan, Ziwei, Xu, Tengyu, Liang, Yingbin

arXiv.org Machine LearningOct-13-2021

As a major value function evaluation method, temporal difference (TD) learning (Sutton, 1988; Dayan, 1992) has been widely used in various planning problems in reinforcement learning. Although TD learning performs successfully in the on-policy settings, where an agent can interact with environments under the target policy, it can perform poorly or even diverge under the off-policy settings when the agent only has access to data sampled by a behavior policy (Baird, 1995; Tsitsiklis and Van Roy, 1997; Mahmood et al., 2015). To address such an issue, the gradient temporal-difference (GTD) (Sutton et al., 2008) and least-squares temporal difference (LSTD) (Yu, 2010) algorithms have been proposed, which have been shown to converge in the off-policy settings. However, since GTD and LSTD consider an objective function based on the behavior policy, their converging points can be largely biased from the true value function due to the distribution mismatch between the target and behavior policies, even when the express power of the function approximation class is arbitrarily large (Kolter, 2011). In order to provide a more accurate evaluation, Sutton et al. (2016) proposed the emphatic temporal difference (ETD) algorithm, which introduces the follow-on trace to address the distribution mismatch issue. The stability of ETD was then shown in Sutton et al. (2016); Mahmood et al. (2015), and the asymptotic convergence guarantee for ETD was established in Yu (2015), it has also achieved great success in many tasks (Ghiassian et al., 2016; Ni, 2021).

machine learning, reinforcement learning, teaching method, (16 more...)

arXiv.org Machine Learning

2110.06906

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

Guan, Ziwei, Xu, Tengyu, Liang, Yingbin

arXiv.org Machine LearningJun-24-2020

Generative adversarial imitation learning (GAIL) is a popular inverse reinforcement learning approach for jointly optimizing policy and reward from expert trajectories. A primary question about GAIL is whether applying a certain policy gradient algorithm to GAIL attains a global minimizer (i.e., yields the expert policy), for which existing understanding is very limited. Such global convergence has been shown only for the linear (or linear-type) MDP and linear (or linearizable) reward. In this paper, we study GAIL under general MDP and for nonlinear reward function classes (as long as the objective function is strongly concave with respect to the reward parameter). We characterize the global convergence with a sublinear rate for a broad range of commonly used policy gradient algorithms, all of which are implemented in an alternating manner with stochastic gradient ascent for reward update, including projected policy gradient (PPG)-GAIL, Frank-Wolfe policy gradient (FWPG)-GAIL, trust region policy optimization (TRPO)-GAIL and natural policy gradient (NPG)-GAIL. This is the first systematic theoretical study of GAIL for global convergence.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2006.13506

Country: North America > United States (0.45)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback