AITopics | Wang, Xuewei

Collaborating Authors

Wang, Xuewei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Self-Generated Critiques Boost Reward Modeling for Language Models

Yu, Yue, Chen, Zhengxing, Zhang, Aston, Tan, Liang, Zhu, Chenguang, Pang, Richard Yuanzhe, Qian, Yundi, Wang, Xuewei, Gururangan, Suchin, Zhang, Chao, Kambadur, Melanie, Mahajan, Dhruv, Hou, Rui

arXiv.org Artificial IntelligenceDec-18-2024

Reinforcement Learning from Human Feedback (RLHF) has been widely adopted to align large language models (LLMs) with human preferences (Ouyang et al., 2022; Touvron et al., 2023; Dubey et al., 2024; Reid et al., 2024). Central to the RLHF process is the reward model (RM), which is trained to assign scores that quantify how well the model's outputs align with human judgments. The reward model defines optimization direction during training (e.g., reward signal in PPO), encouraging a policy LLM to generate more helpful, honest, and harmless responses ultimately enhancing the model's generation quality in real-world applications. Standard reward models are typically trained using preference pairs and optimized with pairwise logistic loss (Bradley and Terry, 1952), producing a single scalar score for each response. However, outputting a scalar score not only is hard to interpret but also fails to fully leverage the inherent language modeling capability that LLMs obtain from pretraining and post-training (Zhang et al., 2024). Consequently, these reward models tend to be less data-efficient and prone to robustness issues, such as reward hacking (Skalse et al., 2022; Singhal et al., 2023; Chen et al., 2024).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.16646

Country:

North America > Mexico (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Law of the Weakest Link: Cross Capabilities of Large Language Models

Zhong, Ming, Zhang, Aston, Wang, Xuewei, Hou, Rui, Xiong, Wenhan, Zhu, Chenguang, Chen, Zhengxing, Tan, Liang, Bi, Chloe, Lewis, Mike, Popuri, Sravya, Narang, Sharan, Kambadur, Melanie, Mahajan, Dhruv, Edunov, Sergey, Han, Jiawei, van der Maaten, Laurens

arXiv.org Artificial IntelligenceOct-2-2024

The development and evaluation of Large Language Models (LLMs) have largely focused on individual capabilities. However, this overlooks the intersection of multiple abilities across different types of expertise that are often required for real-world tasks, which we term cross capabilities. To systematically explore this concept, we first define seven core individual capabilities and then pair them to form seven common cross capabilities, each supported by a manually constructed taxonomy. Building on these definitions, we introduce CrossEval, a benchmark comprising 1,400 human-annotated prompts, with 100 prompts for each individual and cross capability. To ensure reliable evaluation, we involve expert annotators to assess 4,200 model responses, gathering 8,400 human ratings with detailed explanations to serve as reference examples. Our findings reveal that, in both static evaluations and attempts to enhance specific abilities, current LLMs consistently exhibit the "Law of the Weakest Link," where cross-capability performance is significantly constrained by the weakest component. Specifically, across 58 cross-capability scores from 17 models, 38 scores are lower than all individual capabilities, while 20 fall between strong and weak, but closer to the weaker ability. These results highlight the under-performance of LLMs in cross-capability tasks, making the identification and improvement of the weakest capabilities a critical priority for future research to optimize performance in complex, multi-dimensional scenarios.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2409.19951

Country:

Africa (0.92)
Europe > Austria > Vienna (0.14)
North America > United States > Illinois (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Games (1.00)
Law (0.93)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

Zhang, Wei, Li, Dai, Liang, Chen, Zhou, Fang, Zhang, Zhongke, Wang, Xuewei, Li, Ru, Zhou, Yi, Huang, Yaning, Liang, Dong, Wang, Kai, Wang, Zhangyuan, Chen, Zhengxing, Li, Min, Wu, Fenggang, Chen, Minghai, Li, Huayu, Wu, Yunnan, Shu, Zhan, Yuan, Mindi, Reddy, Sri

arXiv.org Artificial IntelligenceNov-15-2023

Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and infrastructure cost savings.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2311.09544

Country:

Europe (0.94)
North America > United States > New York > New York County > New York City (0.17)
North America > United States > Massachusetts > Suffolk County > Boston (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry:

Marketing (0.86)
Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Human Computer Interaction (0.72)

Add feedback

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

Wang, Xuewei, Jin, Qiang, Huang, Shengyu, Zhang, Min, Liu, Xi, Zhao, Zhengli, Chen, Yukun, Zhang, Zhengyu, Yang, Jiyan, Wen, Ellie, Chordia, Sagar, Chen, Wenlin, Huang, Qin

arXiv.org Artificial IntelligenceJul-12-2023

Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system.

artificial intelligence, machine learning, multi-task learning framework, (14 more...)

arXiv.org Artificial Intelligence

2307.11096

Country: North America > United States > California (0.15)

Genre: Research Report (0.50)

Industry: Marketing (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How to Build User Simulators to Train RL-based Dialog Systems

Shi, Weiyan, Qian, Kun, Wang, Xuewei, Yu, Zhou

arXiv.org Artificial IntelligenceSep-3-2019

User simulators are essential for training reinforcement learning (RL) based dialog models. However, building a good user simulator that models real user behaviors is challenging. We propose a method of standardizing user simulator building that can be used by the community to compare dialog system quality using the same set of user simulators fairly. We present implementations of six user simulators trained with different dialog planning and generation methods. We then calculate a set of automatic metrics to evaluate the quality of these simulators both directly and indirectly. We also ask human users to assess the simulators directly and indirectly by rating the simulated dialogs and interacting with the trained systems. This paper presents a comprehensive evaluation framework for user simulator study and provides a better understanding of the pros and cons of different user simulators, as well as their impacts on the trained systems. 1 1 Introduction Reinforcement Learning has gained more and more attention in dialog system training because it treats the dialog planning as a sequential decision problem and focuses on long-term rewards (Su et al., 2017). However, RL requires interaction with the environment, and obtaining real human users to interact with the system is both time-consuming and labor-intensive. Therefore, building user simulators to interact with the system before deployment to real users becomes an economical choice (Williams et al., 2017; Li et al., 2016). But the performance of the user simulator has a direct impact on the trained RL policy.* Equal contribution. 1 The code and data are released at https://github.

deep learning, neural network, simulator, (18 more...)

arXiv.org Artificial Intelligence

1909.01388

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good

Wang, Xuewei, Shi, Weiyan, Kim, Richard, Oh, Yoojung, Yang, Sijia, Zhang, Jingwen, Yu, Zhou

arXiv.org Artificial IntelligenceJun-16-2019

Developing intelligent persuasive conversational agents to change people's opinions and actions for social good is the frontier in advancing the ethical development of automated dialogue systems. To do so, the first step is to understand the intricate organization of strategic disclosures and appeals employed in human persuasion conversations. We designed an online persuasion task where one participant was asked to persuade the other to donate to a specific charity. We collected a large dataset with 1,017 dialogues and annotated emerging persuasion strategies from a subset. Based on the annotation, we built a baseline classifier with context information and sentence-level features to predict the 10 persuasion strategies used in the corpus. Furthermore, to develop an understanding of personalized persuasion processes, we analyzed the relationships between individuals' demographic and psychological backgrounds including personality, morality, value systems, and their willingness for donation. Then, we analyzed which types of persuasion strategies led to a greater amount of donation depending on the individuals' personal backgrounds. This work lays the ground for developing a personalized persuasive dialogue system.

deep learning, donation, neural network, (20 more...)

arXiv.org Artificial Intelligence

1906.06725

Country:

Asia (0.46)
North America > United States > California (0.14)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.69)

Industry: Social Sector (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

Add feedback