AITopics | Shi, Chengchun

Plotting

Shi, Chengchun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Distributional Learning with Non-crossing Quantile Network

Shen, Guohao, Dai, Runpeng, Wu, Guojun, Luo, Shikai, Shi, Chengchun, Zhu, Hongtu

arXiv.org Machine LearningApr-10-2025

In this paper, we introduce a non-crossing quantile (NQ) network for conditional distribution learning. By leveraging non-negative activation functions, the NQ network ensures that the learned distributions remain monotonic, effectively addressing the issue of quantile crossing. Furthermore, the NQ network-based deep distributional learning framework is highly adaptable, applicable to a wide range of applications, from classical non-parametric quantile regression to more advanced tasks such as causal effect estimation and distributional reinforcement learning (RL). We also develop a comprehensive theoretical foundation for the deep NQ estimator and its application to distributional RL, providing an in-depth analysis that demonstrates its effectiveness across these domains. Our experimental results further highlight the robustness and versatility of the NQ network.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2504.08215

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Education (0.45)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Statistical Inference in Reinforcement Learning: A Selective Survey

Shi, Chengchun

arXiv.org Machine LearningFeb-22-2025

Thus, the observed data can be summarized into a sequence of "observation-action-reward" triplets ( O t, A t, R t) t 0. It is worth noting that the observation O t at each time step is not equivalent to the environment's state S t. Indeed, the state can be viewed as a special observation with the Markov property, and we will elaborate on the difference between the two later. Policies: The goal of RL is to learn an optimal policy π based on the observation-action-reward triplets to maximize the agent's cumulative reward. Mathematically, a policy is defined as a conditional probability distribution function mapping the agent's observed data history to the action space. It specifies the probability of the agent taking different actions at each time step. Below, we introduce three types of policies (see Figure 1(b) for a visualization of their relationships): (1) History-dependent policy: This is the most general form of policy. At each time t, we define H t as the set containing the current observation O t and all prior historical information (O i, A i, R i) i

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2502.16195

Country:

North America > United States (0.14)
Europe > United Kingdom (0.14)
Europe > Portugal (0.14)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing

Wang, Jitao, Shi, Chengchun, Piette, John D., Loftus, Joshua R., Zeng, Donglin, Wu, Zhenke

arXiv.org Machine LearningJan-13-2025

When applied in healthcare, reinforcement learning (RL) seeks to dynamically match the right interventions to subjects to maximize population benefit. However, the learned policy may disproportionately allocate efficacious actions to one subpopulation, creating or exacerbating disparities in other socioeconomically-disadvantaged subgroups. These biases tend to occur in multi-stage decision making and can be self-perpetuating, which if unaccounted for could cause serious unintended consequences that limit access to care or treatment benefit. Counterfactual fairness (CF) offers a promising statistical tool grounded in causal inference to formulate and study fairness. In this paper, we propose a general framework for fair sequential decision making. We theoretically characterize the optimal CF policy and prove its stationarity, which greatly simplifies the search for optimal CF policies by leveraging existing RL algorithms. The theory also motivates a sequential data preprocessing algorithm to achieve CF decision making under an additive noise assumption. We prove and then validate our policy learning approach in controlling unfairness and attaining optimal value through simulations. Analysis of a digital health dataset designed to reduce opioid misuse shows that our proposal greatly enhances fair access to counseling.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2501.06366

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Dual Active Learning for Reinforcement Learning from Human Feedback

Liu, Pangpang, Shi, Chengchun, Sun, Will Wei

arXiv.org Machine LearningDec-30-2024

Aligning large language models (LLMs) with human preferences is critical to recent advances in generative artificial intelligence. Reinforcement learning from human feedback (RLHF) is widely applied to achieve this objective. A key step in RLHF is to learn the reward function from human feedback. However, human feedback is costly and time-consuming, making it essential to collect high-quality conversation data for human teachers to label. Additionally, different human teachers have different levels of expertise. It is thus critical to query the most appropriate teacher for their opinions. In this paper, we use offline reinforcement learning (RL) to formulate the alignment problem. Motivated by the idea of $D$-optimal design, we first propose a dual active reward learning algorithm for the simultaneous selection of conversations and teachers. Next, we apply pessimistic RL to solve the alignment problem, based on the learned reward estimator. Theoretically, we show that the reward estimator obtained through our proposed adaptive selection strategy achieves minimal generalized variance asymptotically, and prove that the sub-optimality of our pessimistic policy scales as $O(1/\sqrt{T})$ with a given sample budget $T$. Through simulations and experiments on LLMs, we demonstrate the effectiveness of our algorithm and its superiority over state-of-the-arts.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2410.02504

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Education (1.00)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning

Yu, Shuguang, Fang, Shuxing, Peng, Ruixin, Qi, Zhengling, Zhou, Fan, Shi, Chengchun

arXiv.org Machine LearningDec-7-2024

Inspired by the two-way fixed effects regression model widely used in the panel data literature, we propose a two-way unmeasured confounding assumption to model the system dynamics in causal reinforcement learning and develop a two-way deconfounder algorithm that devises a neural tensor network to simultaneously learn both the unmeasured confounders and the system dynamics, based on which a model-based estimator can be constructed for consistent policy value estimation. We illustrate the effectiveness of the proposed estimator through theoretical results and numerical experiments.

confounder, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2412.05783

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Forward and Backward State Abstractions for Off-policy Evaluation

Hao, Meiling, Su, Pingfan, Hu, Liyuan, Szabo, Zoltan, Zhao, Qingyuan, Shi, Chengchun

arXiv.org Machine LearningJun-27-2024

Off-policy evaluation (OPE) is crucial for evaluating a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions - originally designed for policy learning - in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE.

abstraction, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2406.19531

Country:

Asia (0.28)
North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Combining Experimental and Historical Data for Policy Evaluation

Li, Ting, Shi, Chengchun, Wen, Qianglin, Sui, Yang, Qin, Yongli, Lai, Chunbo, Zhu, Hongtu

arXiv.org Machine LearningJun-1-2024

This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2406.00317

Country:

North America > United States (0.45)
Europe > United Kingdom > England (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.92)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

An Analysis of Switchback Designs in Reinforcement Learning

Wen, Qianglin, Shi, Chengchun, Yang, Ying, Tang, Niansheng, Zhu, Hongtu

arXiv.org Machine LearningMar-25-2024

This paper offers a detailed investigation of switchback designs in A/B testing, which alternate between baseline and new policies over time. Our aim is to thoroughly evaluate the effects of these designs on the accuracy of their resulting average treatment effect (ATE) estimators. We propose a novel "weak signal analysis" framework, which substantially simplifies the calculations of the mean squared errors (MSEs) of these ATEs in Markov decision process environments. Our findings suggest that (i) when the majority of reward errors are positively correlated, the switchback design is more efficient than the alternating-day design which switches policies in a daily basis. Additionally, increasing the frequency of policy switches tends to reduce the MSE of the ATE estimator. (ii) When the errors are uncorrelated, however, all these designs become asymptotically equivalent. (iii) In cases where the majority of errors are negative correlated, the alternating-day design becomes the optimal choice. These insights are crucial, offering guidelines for practitioners on designing experiments in A/B testing. Our analysis accommodates a variety of policy value estimators, including model-based estimators, least squares temporal difference learning estimators, and double reinforcement learning estimators, thereby offering a comprehensive understanding of optimal design strategies for policy evaluation in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2403.17285

Country: North America > United States > New York (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Transportation (0.68)
Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

Wang, Danyang, Shi, Chengchun, Luo, Shikai, Sun, Will Wei

arXiv.org Artificial IntelligenceMar-18-2024

In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achieving high-quality policy learning. However, most existing offline reinforcement learning (RL) methods depend on two key assumptions--unconfoundedness and positivity--which frequently do not hold in observational data contexts. Recognizing these challenges, we propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on front-door criterion to remove the confounding bias; additionally, we adopt the pessimistic principle to address the distributional shift between the action distributions induced by candidate policies, and the behavior policy that generates the observational data. Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2403.11841

Country:

North America > United States (0.14)
Europe > Portugal (0.14)

Genre: Research Report > Experimental Study (0.54)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
(3 more...)

Add feedback

Policy Evaluation for Temporal and/or Spatial Dependent Experiments

Luo, Shikai, Yang, Ying, Shi, Chengchun, Yao, Fang, Ye, Jieping, Zhu, Hongtu

arXiv.org Machine LearningDec-3-2023

The aim of this paper is to establish a causal link between the policies implemented by technology companies and the outcomes they yield within intricate temporal and/or spatial dependent experiments. We propose a novel temporal/spatio-temporal Varying Coefficient Decision Process (VCDP) model, capable of effectively capturing the evolving treatment effects in situations characterized by temporal and/or spatial dependence. Our methodology encompasses the decomposition of the Average Treatment Effect (ATE) into the Direct Effect (DE) and the Indirect Effect (IE). We subsequently devise comprehensive procedures for estimating and making inferences about both DE and IE. Additionally, we provide a rigorous analysis of the statistical properties of these procedures, such as asymptotic power. To substantiate the effectiveness of our approach, we carry out extensive simulations and real data analyses.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2202.10887

Country: North America > United States > North Carolina (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Epidemiology (0.92)
Transportation > Ground > Road (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(2 more...)

Add feedback