AITopics | Wang, Hanzhao

Collaborating Authors

Wang, Hanzhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

Liu, Shang, Wang, Hanzhao, Ma, Zhongyao, Li, Xiaocheng

arXiv.org Artificial IntelligenceFeb-10-2025

Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we investigate the questions of assessing the performance of human annotators and incentivizing them to provide high-quality annotations. The quality assessment of language/text annotation faces two challenges: (i) the intrinsic heterogeneity among annotators, which prevents the classic methods that assume the underlying existence of a true label; and (ii) the unclear relationship between the annotation quality and the performance of downstream tasks, which excludes the possibility of inferring the annotators' behavior based on the model performance trained from the annotation data. Then we formulate a principal-agent model to characterize the behaviors of and the interactions between the company and the human annotators. The model rationalizes a practical mechanism of a bonus scheme to incentivize annotators which benefits both parties and it underscores the importance of the joint presence of an assessment system and a proper contract scheme. From a technical perspective, our analysis extends the existing literature on the principal-agent model by considering a continuous action space for the agent. We show the gap between the first-best and the second-best solutions (under the continuous action space) is of $\Theta(1/\sqrt{n \log n})$ for the binary contracts and $\Theta(1/n)$ for the linear contracts, where $n$ is the number of samples used for performance assessment; this contrasts with the known result of $\exp(-\Theta(n))$ for the binary contracts when the action space is discrete. Throughout the paper, we use real preference annotation data to accompany our discussions.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.06387

Country:

Europe (0.45)
North America > United States (0.27)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making

Wang, Hanzhao, Pan, Yu, Sun, Fupeng, Liu, Shang, Talluri, Kalyan, Chen, Guanting, Li, Xiaocheng

arXiv.org Artificial IntelligenceMay-23-2024

In this paper, we consider the supervised pretrained transformer for a class of sequential decision-making problems. The class of considered problems is a subset of the general formulation of reinforcement learning in that there is no transition probability matrix, and the class of problems covers bandits, dynamic pricing, and newsvendor problems as special cases. Such a structure enables the use of optimal actions/decisions in the pretraining phase, and the usage also provides new insights for the training and generalization of the pretrained transformer. We first note that the training of the transformer model can be viewed as a performative prediction problem, and the existing methods and theories largely ignore or cannot resolve the arisen out-of-distribution issue. We propose a natural solution that includes the transformer-generated action sequences in the training procedure, and it enjoys better properties both numerically and theoretically. The availability of the optimal actions in the considered tasks also allows us to analyze the properties of the pretrained transformer as an algorithm and explains why it may lack exploration and how this can be automatically resolved. Numerically, we categorize the advantages of the pretrained transformer over the structured algorithms such as UCB and Thompson sampling into three cases: (i) it better utilizes the prior knowledge in the pretraining data; (ii) it can elegantly handle the misspecification issue suffered by the structured algorithms; (iii) for short time horizon such as $T\le50$, it behaves more greedy and enjoys much better regret than the structured algorithms which are designed for asymptotic optimality.

data mining, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2405.14219

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Towards Better Statistical Understanding of Watermarking LLMs

Cai, Zhongze, Liu, Shang, Wang, Hanzhao, Zhong, Huaiyang, Li, Xiaocheng

arXiv.org Machine LearningMar-18-2024

As the ability of large language models (LLMs) evolves rapidly, their applications have gradually touched every corner of our daily lives. However, these fast-developing tools raise concerns about the abuse of LLMs. The misuse of LLMs could harm human society in ways such as launching bots on social media, creating fake news and content, and cheating on writing school essays. The overwhelming synthetic data created by the LLMs rather than real humans is also dragging down the efforts to improve the LLMs themselves: the synthetic data pollutes the data pool and should be detected and removed to create a high-quality dataset before training (Radford et al., 2023). Numerous attempts have been made to make the detection possible which can mainly be classified into two categories: post hoc detection that does not modify the language model and the watermarking that changes the output to encode information in the content. Post hoc detection aims to train models that directly label the texts without monitoring the generation process. Although post hoc detections do not require access to modify the output of LLMs, they do make use of statistical features such as the internal activations of the LLMs. For example, when being inspected by another LLM, the statistical properties of machine-generated texts deviate from the human-generated ones in some aspects such as the distributions of token log-likelihoods (Gehrmann et al., 2019; Ippolito et al., 2019; Zellers et al., 2019; Solaiman et al., 2019; Tian, 2023; Mitchell et al., 2023). However, post hoc ways usually rely on the fundamental assumption that machine-generated texts statistically deviate from human-generated texts, which could be challenged in two ways.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Machine Learning

2403.13027

Country:

North America > United States (0.27)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

When No-Rejection Learning is Consistent for Regression with Rejection

Li, Xiaocheng, Liu, Shang, Sun, Chunlin, Wang, Hanzhao

arXiv.org Artificial IntelligenceOct-17-2023

Learning with rejection has been a prototypical model for studying the human-AI interaction on prediction tasks. Upon the arrival of a sample instance, the model first uses a rejector to decide whether to accept and use the AI predictor to make a prediction or reject and defer the sample to humans. Learning such a model changes the structure of the original loss function and often results in undesirable non-convexity and inconsistency issues. For the classification with rejection problem, several works develop consistent surrogate losses for the joint learning of the predictor and the rejector, while there have been fewer works for the regression counterpart. This paper studies the regression with rejection (RwR) problem and investigates a no-rejection learning strategy that uses all the data to learn the predictor. We first establish the consistency for such a strategy under the weak realizability condition. Then for the case without the weak realizability, we show that the excessive risk can also be upper bounded with the sum of two parts: prediction error and calibration error. Lastly, we demonstrate the advantage of such a proposed learning strategy with empirical evidence.

artificial intelligence, machine learning, rejector, (15 more...)

arXiv.org Artificial Intelligence

2307.02932

Country:

North America > United States (0.14)
Europe > Italy (0.14)
Europe > Finland (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Transformer Choice Net: A Transformer Neural Network for Choice Prediction

Wang, Hanzhao, Li, Xiaocheng, Talluri, Kalyan

arXiv.org Artificial IntelligenceOct-12-2023

Firms are interested in understanding the choice behavior of their customers as well as forecasting the sales of their items. When customers choose at most one item per shopping instance, discrete-choice models estimate the probability of the choice, either at a segment level or individual customer level, based on a latent utility function of the features of the item, the customer, and the provided assortment. However, there are many situations where customers choose multiple items on a single shopping instance, either from the same category or across categories. The firm may be aware of only the final choices made by the customer (as in physical retail) or the precise sequence of those choices (such as in an e-commerce setting). Multi-choice models are used for the former case, to estimate the probability of choosing a subset of items, amongst all possible subsets of the given assortment, considering potential interactions amongst the items and their features. Sequential choice models consider the sequence of choices, taking into account not only the item and customer features but also what the customer has chosen till then to predict the subsequent choice(s). Modeling and predicting the choice probabilities for these situations is challenging: the complexity of the sequential and multi-choice models is considerably more than in the single-choice case because of combinatorial explosion in the number of possible customer journeys and final choices, and consequently models for multiple choices are less widely adapted in practice. In this paper, we introduce the Transformer Choice Net, a neural network using the Transformer architecture (Vaswani et al., 2017), as a data-driven solution that works under any of the three models: single, sequential, and multiple.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2310.08716

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Transportation (0.46)
Energy > Oil & Gas (0.46)
Automobiles & Trucks (0.46)
Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Make Adherence-Aware Advice

Chen, Guanting, Li, Xiaocheng, Sun, Chunlin, Wang, Hanzhao

arXiv.org Machine LearningOct-1-2023

As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2310.00817

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Add feedback

A Neural Network Based Choice Model for Assortment Optimization

Wang, Hanzhao, Cai, Zhongze, Li, Xiaocheng, Talluri, Kalyan

arXiv.org Artificial IntelligenceAug-10-2023

Discrete-choice models are used in economics, marketing and revenue management to predict customer purchase probabilities, say as a function of prices and other features of the offered assortment. While they have been shown to be expressive, capturing customer heterogeneity and behaviour, they are also hard to estimate, often based on many unobservables like utilities; and moreover, they still fail to capture many salient features of customer behaviour. A natural question then, given their success in other contexts, is if neural networks can eliminate the necessity of carefully building a context-dependent customer behaviour model and hand-coding and tuning the estimation. It is unclear however how one would incorporate assortment effects into such a neural network, and also how one would optimize the assortment with such a black-box generative model of choice probabilities. In this paper we investigate first whether a single neural network architecture can predict purchase probabilities for datasets from various contexts and generated under various models and assumptions. Next, we develop an assortment optimization formulation that is solvable by off-the-shelf integer programming solvers. We compare against a variety of benchmark discrete-choice models on simulated as well as real-world datasets, developing training tricks along the way to make the neural network prediction and subsequent optimization robust and comparable in performance to the alternates.

artificial intelligence, choice model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2308.05617

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Air (0.48)
Education (0.46)
Energy > Oil & Gas (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

On Dynamic Pricing with Covariates

Wang, Hanzhao, Talluri, Kalyan, Li, Xiaocheng

arXiv.org Machine LearningDec-25-2021

We consider the dynamic pricing problem with covariates under a generalized linear demand model: a seller can dynamically adjust the price of a product over a horizon of $T$ time periods, and at each time period $t$, the demand of the product is jointly determined by the price and an observable covariate vector $x_t\in\mathbb{R}^d$ through an unknown generalized linear model. Most of the existing literature assumes the covariate vectors $x_t$'s are independently and identically distributed (i.i.d.); the few papers that relax this assumption either sacrifice model generality or yield sub-optimal regret bounds. In this paper we show that a simple pricing algorithm has an $O(d\sqrt{T}\log T)$ regret upper bound without assuming any statistical structure on the covariates $x_t$ (which can even be arbitrarily chosen). The upper bound on the regret matches the lower bound (even under the i.i.d. assumption) up to logarithmic factors. Our paper thus shows that (i) the i.i.d. assumption is not necessary for obtaining low regret, and (ii) the regret bound can be independent of the (inverse) minimum eigenvalue of the covariance matrix of the $x_t$'s, a quantity present in previous bounds. Furthermore, we discuss a condition under which a better regret is achievable and how a Thompson sampling algorithm can be applied to give an efficient computation of the prices.

artificial intelligence, demand model, machine learning, (17 more...)

arXiv.org Machine Learning

2112.13254

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback