AITopics | Chen, Zhengxing

Collaborating Authors

Chen, Zhengxing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Self-Generated Critiques Boost Reward Modeling for Language Models

Yu, Yue, Chen, Zhengxing, Zhang, Aston, Tan, Liang, Zhu, Chenguang, Pang, Richard Yuanzhe, Qian, Yundi, Wang, Xuewei, Gururangan, Suchin, Zhang, Chao, Kambadur, Melanie, Mahajan, Dhruv, Hou, Rui

arXiv.org Artificial IntelligenceDec-18-2024

Reinforcement Learning from Human Feedback (RLHF) has been widely adopted to align large language models (LLMs) with human preferences (Ouyang et al., 2022; Touvron et al., 2023; Dubey et al., 2024; Reid et al., 2024). Central to the RLHF process is the reward model (RM), which is trained to assign scores that quantify how well the model's outputs align with human judgments. The reward model defines optimization direction during training (e.g., reward signal in PPO), encouraging a policy LLM to generate more helpful, honest, and harmless responses ultimately enhancing the model's generation quality in real-world applications. Standard reward models are typically trained using preference pairs and optimized with pairwise logistic loss (Bradley and Terry, 1952), producing a single scalar score for each response. However, outputting a scalar score not only is hard to interpret but also fails to fully leverage the inherent language modeling capability that LLMs obtain from pretraining and post-training (Zhang et al., 2024). Consequently, these reward models tend to be less data-efficient and prone to robustness issues, such as reward hacking (Skalse et al., 2022; Singhal et al., 2023; Chen et al., 2024).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.16646

Country:

North America > Mexico (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Law of the Weakest Link: Cross Capabilities of Large Language Models

Zhong, Ming, Zhang, Aston, Wang, Xuewei, Hou, Rui, Xiong, Wenhan, Zhu, Chenguang, Chen, Zhengxing, Tan, Liang, Bi, Chloe, Lewis, Mike, Popuri, Sravya, Narang, Sharan, Kambadur, Melanie, Mahajan, Dhruv, Edunov, Sergey, Han, Jiawei, van der Maaten, Laurens

arXiv.org Artificial IntelligenceOct-2-2024

The development and evaluation of Large Language Models (LLMs) have largely focused on individual capabilities. However, this overlooks the intersection of multiple abilities across different types of expertise that are often required for real-world tasks, which we term cross capabilities. To systematically explore this concept, we first define seven core individual capabilities and then pair them to form seven common cross capabilities, each supported by a manually constructed taxonomy. Building on these definitions, we introduce CrossEval, a benchmark comprising 1,400 human-annotated prompts, with 100 prompts for each individual and cross capability. To ensure reliable evaluation, we involve expert annotators to assess 4,200 model responses, gathering 8,400 human ratings with detailed explanations to serve as reference examples. Our findings reveal that, in both static evaluations and attempts to enhance specific abilities, current LLMs consistently exhibit the "Law of the Weakest Link," where cross-capability performance is significantly constrained by the weakest component. Specifically, across 58 cross-capability scores from 17 models, 38 scores are lower than all individual capabilities, while 20 fall between strong and weak, but closer to the weaker ability. These results highlight the under-performance of LLMs in cross-capability tasks, making the identification and improvement of the weakest capabilities a critical priority for future research to optimize performance in complex, multi-dimensional scenarios.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2409.19951

Country:

Africa (0.92)
Europe > Austria > Vienna (0.14)
North America > United States > Illinois (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Games (1.00)
Law (0.93)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AutoML for Large Capacity Modeling of Meta's Ranking Systems

Yin, Hang, Liu, Kuang-Hung, Sun, Mengying, Chen, Yuxin, Zhang, Buyun, Liu, Jiang, Sehgal, Vivek, Panchal, Rudresh Rajnikant, Hotaj, Eugen, Liu, Xi, Guo, Daifeng, Zhang, Jamey, Wang, Zhou, Jiang, Shali, Li, Huayu, Chen, Zhengxing, Chen, Wen-Yen, Yang, Jiyan, Wen, Wei

arXiv.org Artificial IntelligenceNov-16-2023

Web-scale ranking systems at Meta serving billions of users is complex. Improving ranking models is essential but engineering heavy. Automated Machine Learning (AutoML) can release engineers from labor intensive work of tuning ranking models; however, it is unknown if AutoML is efficient enough to meet tight production timeline in real-world and, at the same time, bring additional improvements to the strong baselines. Moreover, to achieve higher ranking performance, there is an ever-increasing demand to scale up ranking models to even larger capacity, which imposes more challenges on the efficiency. The large scale of models and tight production schedule requires AutoML to outperform human baselines by only using a small number of model evaluation trials (around 100). We presents a sampling-based AutoML method, focusing on neural architecture search and hyperparameter optimization, addressing these challenges in Meta-scale production when building large capacity models. Our approach efficiently handles large-scale data demands. It leverages a lightweight predictor-based searcher and reinforcement learning to explore vast search spaces, significantly reducing the number of model evaluations. Through experiments in large capacity modeling for CTR and CVR applications, we show that our method achieves outstanding Return on Investment (ROI) versus human tuned baselines, with up to 0.09% Normalized Entropy (NE) loss reduction or $25\%$ Query per Second (QPS) increase by only sampling one hundred models on average from a curated search space. The proposed AutoML method has already made real-world impact where a discovered Instagram CTR model with up to -0.36% NE gain (over existing production baseline) was selected for large-scale online A/B test and show statistically significant gain. These production results proved AutoML efficacy and accelerated its adoption in ranking systems at Meta.

artificial intelligence, machine learning, predictor, (18 more...)

arXiv.org Artificial Intelligence

2311.0787

Country: North America > United States (0.29)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

Zhang, Wei, Li, Dai, Liang, Chen, Zhou, Fang, Zhang, Zhongke, Wang, Xuewei, Li, Ru, Zhou, Yi, Huang, Yaning, Liang, Dong, Wang, Kai, Wang, Zhangyuan, Chen, Zhengxing, Li, Min, Wu, Fenggang, Chen, Minghai, Li, Huayu, Wu, Yunnan, Shu, Zhan, Yuan, Mindi, Reddy, Sri

arXiv.org Artificial IntelligenceNov-15-2023

Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and infrastructure cost savings.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2311.09544

Country:

Europe (0.94)
North America > United States > New York > New York County > New York City (0.17)
North America > United States > Massachusetts > Suffolk County > Boston (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry:

Marketing (0.86)
Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Human Computer Interaction (0.72)

Add feedback

Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale

Wen, Wei, Liu, Kuang-Hung, Fedorov, Igor, Zhang, Xin, Yin, Hang, Chu, Weiwei, Hassani, Kaveh, Sun, Mengying, Liu, Jiang, Wang, Xu, Jiang, Lin, Chen, Yuxin, Zhang, Buyun, Liu, Xi, Cheng, Dehua, Chen, Zhengxing, Zhao, Guang, Han, Fangqiu, Yang, Jiyan, Hao, Yuchen, Xiong, Liang, Chen, Wen-Yen

arXiv.org Artificial IntelligenceNov-13-2023

Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and potential for ranking systems. However, prior work focused on academic problems, which are evaluated at small scale under well-controlled fixed baselines. In industry system, such as ranking system in Meta, it is unclear whether NAS algorithms from the literature can outperform production baselines because of: (1) scale - Meta ranking systems serve billions of users, (2) strong baselines - the baselines are production models optimized by hundreds to thousands of world-class engineers for years since the rise of deep learning, (3) dynamic baselines - engineers may have established new and stronger baselines during NAS search, and (4) efficiency - the search pipeline must yield results quickly in alignment with the productionization life cycle. In this paper, we present Rankitect, a NAS software framework for ranking systems at Meta. Rankitect seeks to build brand new architectures by composing low level building blocks from scratch. Rankitect implements and improves state-of-the-art (SOTA) NAS methods for comprehensive and fair comparison under the same search space, including sampling-based NAS, one-shot NAS, and Differentiable NAS (DNAS). We evaluate Rankitect by comparing to multiple production ranking models at Meta. We find that Rankitect can discover new models from scratch achieving competitive tradeoff between Normalized Entropy loss and FLOPs. When utilizing search space designed by engineers, Rankitect can generate better models than engineers, achieving positive offline evaluation and online A/B test at Meta scale.

artificial intelligence, machine learning, supernet, (14 more...)

arXiv.org Artificial Intelligence

2311.0843

Country: North America > United States > Virginia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

Zhang, Tunhou, Cheng, Dehua, He, Yuchen, Chen, Zhengxing, Dai, Xiaoliang, Xiong, Liang, Yan, Feng, Li, Hai, Chen, Yiran, Wen, Wei

arXiv.org Artificial IntelligenceFeb-12-2023

The rise of deep neural networks offers new opportunities in optimizing recommender systems. However, optimizing recommender systems using deep neural networks requires delicate architecture fabrication. We propose NASRec, a paradigm that trains a single supernet and efficiently produces abundant models/sub-architectures by weight sharing. To overcome the data multi-modality and architecture heterogeneity challenges in the recommendation domain, NASRec establishes a large supernet (i.e., search space) to search the full architectures. The supernet incorporates versatile choice of operators and dense connectivity to minimize human efforts for finding priors. The scale and heterogeneity in NASRec impose several challenges, such as training inefficiency, operator-imbalance, and degraded rank correlation. We tackle these challenges by proposing single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning. Our crafted models, NASRecNet, show promising results on three Click-Through Rates (CTR) prediction benchmarks, indicating that NASRec outperforms both manually designed models and existing NAS methods with state-of-the-art performance. Our work is publicly available at https://github.com/facebookresearch/NasRec.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543507.3583446

2207.07187

Country: North America > United States (0.31)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Personalized Execution Time Optimization for the Scheduled Jobs

Liu, Yang, Wang, Juan, Chen, Zhengxing, Fox, Ian, Mufti, Imani, Sukumaran, Jason, He, Baokun, Sun, Xiling, Liang, Feng

arXiv.org Artificial IntelligenceDec-4-2022

Scheduled batch jobs have been widely used on the asynchronous computing platforms to execute various enterprise applications, including the scheduled notifications and the candidate pre-computation for the modern recommender systems. It is important to deliver or update the information to the users at the right time to maintain the user experience and the execution impact. However, it is challenging to provide a versatile execution time optimization solution for the user-basis scheduled jobs to satisfy various product scenarios while maintaining reasonable infrastructure resource consumption. In this paper, we describe how we apply a learning-to-rank approach plus a "best time policy" in the best time selection. In addition, we propose an ensemble learner to minimize the ranking loss by efficiently leveraging multiple streams of user activity signals in our scheduling decisions of the execution time. Especially, we observe the cannibalization cross use cases to compete the user's peak time slot and introduce a coordination system to mitigate the problem. Our optimization approach has been successfully tested with production traffic that serves billions of users per day, with statistically significant improvements in various product metrics, including the notifications and content candidate generation. To the best of our knowledge, our study represents the first ML-based multi-tenant solution of the execution time optimization problem for the scheduled jobs at a large industrial scale cross different product domains.

artificial intelligence, machine learning, time slot, (12 more...)

arXiv.org Artificial Intelligence

2203.06158

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

A Validation Tool for Designing Reinforcement Learning Environments

Xu, Ruiyang, Chen, Zhengxing

arXiv.org Artificial IntelligenceDec-10-2021

Reinforcement learning (RL) has gained increasing attraction in the academia and tech industry with launches to a variety of impactful applications and products. Although research is being actively conducted on many fronts (e.g., offline RL, performance, etc.), many RL practitioners face a challenge that has been largely ignored: determine whether a designed Markov Decision Process (MDP) is valid and meaningful. This study proposes a heuristic-based feature analysis method to validate whether an MDP is well formulated. We believe an MDP suitable for applying RL should contain a set of state features that are both sensitive to actions and predictive in rewards. We tested our method in constructed environments showing that our approach can identify certain invalid environment formulations. As far as we know, performing validity analysis for RL problem formulation is a novel direction. We envision that our tool will serve as a motivational example to help practitioners apply RL in real-world problems more easily.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2112.05519

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology (0.48)
Transportation (0.47)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Reinforcement Learning-based Product Delivery Frequency Control

Liu, Yang, Chen, Zhengxing, Virochsiri, Kittipat, Wang, Juan, Wu, Jiahao, Liang, Feng

arXiv.org Artificial IntelligenceDec-20-2020

Frequency control is an important problem in modern recommender systems. It dictates the delivery frequency of recommendations to maintain product quality and efficiency. For example, the frequency of delivering promotional notifications impacts daily metrics as well as the infrastructure resource consumption (e.g. CPU and memory usage). There remain open questions on what objective we should optimize to represent business values in the long term best, and how we should balance between daily metrics and resource consumption in a dynamically fluctuating environment. We propose a personalized methodology for the frequency control problem, which combines long-term value optimization using reinforcement learning (RL) with a robust volume control technique we termed "Effective Factor". We demonstrate statistically significant improvement in daily metrics and resource efficiency by our method in several notification applications at a scale of billions of users. To our best knowledge, our study represents the first deep RL application on the frequency control problem at such an industrial scale.

artificial intelligence, frequency, information technology services, (17 more...)

arXiv.org Artificial Intelligence

2012.10858

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Marketing (0.68)
Information Technology > Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Band-limited Soft Actor Critic Model

Campo, Miguel, Chen, Zhengxing, Kung, Luke, Virochsiri, Kittipat, Wang, Jianyu

arXiv.org Artificial IntelligenceJun-19-2020

Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments. A key element of SAC networks is entropy regularization, which prevents the SAC actor from optimizing against fine grained features, oftentimes transient, of the state-action value function. This results in better sample efficiency during early training. We take this idea one step further by artificially bandlimiting the target critic spatial resolution through the addition of a convolutional filter. We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low and high frequency components of the state-action value approximation, allowing the critic to learn faster. In experiments, the bandlimited SAC outperformed the classic twin-critic SAC in a number of Gym environments, and displayed more stability in returns. We derive novel insights about SAC by adding a stochastic noise disturbance, a technique that is increasingly being used to learn robust policies that transfer well to the real world counterparts.

deep learning, frequency component, neural network, (16 more...)

arXiv.org Artificial Intelligence

2006.11431

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback