AITopics | Shi, Chengchun

Plotting

Shi, Chengchun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Uehara, Masatoshi, Kiyohara, Haruka, Bennett, Andrew, Chernozhukov, Victor, Jiang, Nan, Kallus, Nathan, Shi, Chengchun, Sun, Wen

arXiv.org Machine LearningNov-14-2023

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

artificial intelligence, machine learning, value function, (15 more...)

arXiv.org Machine Learning

2207.13081

Country: North America > United States (0.27)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards

Zhu, Jin, Wan, Runzhe, Qi, Zhengling, Luo, Shikai, Shi, Chengchun

arXiv.org Machine LearningOct-28-2023

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation (OPE) and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2310.18715

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

Wang, Jiayi, Qi, Zhengling, Shi, Chengchun

arXiv.org Artificial IntelligenceOct-20-2023

As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super reinforcement learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2209.15448

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Banking & Finance > Trading (0.92)
Health & Medicine > Health Care Providers & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Off-policy Evaluation in Doubly Inhomogeneous Environments

Bian, Zeyu, Shi, Chengchun, Qi, Zhengling, Wang, Lan

arXiv.org Artificial IntelligenceSep-7-2023

Reinforcement learning (RL, Sutton and Barto, 2018) aims to optimize an agent's long-term reward by learning an optimal policy that determines the best action to take under every circumstance. RL is closely related to the dynamic treatment regimens (DTR) or adaptive treatment strategies in statistical research for precision medicine (Murphy, 2003; Robins, 2004; Qian and Murphy, 2011; Kosorok and Moodie, 2015; Tsiatis et al., 2019; Qi et al., 2020; Zhou et al., 2022a), which seeks to obtain the optimal treatment policy in finite horizon settings with a few treatment stages that maximizes patients' expected outcome. Nevertheless, statistical methods for DTR mentioned above normally cannot handle large or infinite horizon settings. They require the number of trajectories to tend to infinity to achieve estimation consistency, unlike RL, which works even with finite number of trajectories under certain conditions. In addition to precision medicine, RL has been applied to various fields, such as games (Silver et al., 2016), ridesharing (Xu et al., 2018), mobile health (Liao et al., 2021) and robotics (Levine et al., 2020).

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2306.08719

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area (0.68)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Reinforcement Learning Framework for Dynamic Mediation Analysis

Ge, Lin, Wang, Jitao, Shi, Chengchun, Wu, Zhenke, Song, Rui

arXiv.org Machine LearningSep-2-2023

Mediation analysis learns the causal effect transmitted via mediator variables between treatments and outcomes and receives increasing attention in various scientific domains to elucidate causal relations. Most existing works focus on point-exposure studies where each subject only receives one treatment at a single time point. However, there are a number of applications (e.g., mobile health) where the treatments are sequentially assigned over time and the dynamic mediation effects are of primary interest. Proposing a reinforcement learning (RL) framework, we are the first to evaluate dynamic mediation effects in settings with infinite horizons. We decompose the average treatment effect into an immediate direct effect, an immediate mediation effect, a delayed direct effect, and a delayed mediation effect. Upon the identification of each effect component, we further develop robust and semi-parametrically efficient estimators under the RL framework to infer these causal effects. The superior performance of the proposed method is demonstrated through extensive numerical studies, theoretical results, and an analysis of a mobile health dataset.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2301.13348

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > Experimental Study (0.45)

Industry:

Law > Alternative Dispute Resolution (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Testing for the Markov Property in Time Series via Deep Conditional Generative Learning

Zhou, Yunzhe, Shi, Chengchun, Li, Lexin, Yao, Qiwei

arXiv.org Artificial IntelligenceMay-30-2023

The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the order of the Markov model. We show that the test controls the type-I error asymptotically, and has the power approaching one. Our proposal makes novel contributions in several ways. We utilize and extend state-of-the-art deep generative learning to estimate the conditional density functions, and establish a sharp upper bound on the approximation error of the estimators. We derive a doubly robust test statistic, which employs a nonparametric estimation but achieves a parametric convergence rate. We further adopt sample splitting and cross-fitting to minimize the conditions required to ensure the consistency of the test. We demonstrate the efficacy of the test through both simulations and the three data applications.

artificial intelligence, log 1 2, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.19244

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing

Li, Ting, Shi, Chengchun, Lu, Zhaohua, Li, Yi, Zhu, Hongtu

arXiv.org Artificial IntelligenceMay-17-2023

Many modern tech companies, such as Google, Uber, and Didi, utilize online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with skewed and heavy-tailed outcome distributions may benefit from alternative criteria, such as quantiles. However, assessing dynamic quantile treatment effects (QTE) remains a challenge, particularly when dealing with data from ride-sourcing platforms that involve sequential decision-making across time and space. In this paper, we establish a formal framework to calculate QTE conditional on characteristics independent of the treatment. Under specific model assumptions, we demonstrate that the dynamic conditional QTE (CQTE) equals the sum of individual CQTEs across time, even though the conditional quantile of cumulative rewards may not necessarily equate to the sum of conditional quantiles of individual rewards. This crucial insight significantly streamlines the estimation and inference processes for our target causal estimand. We then introduce two varying coefficient decision process (VCDP) models and devise an innovative method to test the dynamic CQTE. Moreover, we expand our approach to accommodate data from spatiotemporal dependent experiments and examine both conditional quantile direct and indirect effects. To showcase the practical utility of our method, we apply it to three real-world datasets from a ride-sourcing platform. Theoretical findings and comprehensive simulation studies further substantiate our proposal.

experiment, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2305.10187

Genre: Research Report > Experimental Study (1.00)

Industry:

Transportation > Ground > Road (0.64)
Transportation > Infrastructure & Services (0.50)
Transportation > Passenger (0.46)

Technology:

Information Technology > Modeling & Simulation (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Shi, Chengchun, Wan, Runzhe, Song, Ge, Luo, Shikai, Song, Rui, Zhu, Hongtu

arXiv.org Artificial IntelligenceMar-26-2023

This paper concerns the applications in the two-sided markets that involve a group of subjects who are making sequential decisions across time and/or location. In particular, we consider large-scale fleet management in ride-sharing companies, such as Uber, Lyft and Didi. These companies form a typical two-sided market that enables efficient interactions between passengers and drivers (Armstrong, 2006; Rysman, 2009). With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings (Frenken and Schor, 2017; Jin et al., 2018; Hagiu and Wright, 2019). With rich information on passenger demand and locations of taxi drivers, they significantly reduce taxi cruise time and passenger waiting time in comparison to traditional taxi systems (Li et al., 2011; Zhang et al., 2014; Miao et al., 2016). We use the numbers of drivers and call orders to measure the supply and demand at a given time and location. Both supply and demand are spatio-temporal processes and they interact with each other. These processes depend strongly on the platform's policies, and have a huge impact on the platform's outcomes of interest, such as drivers' income level and working time, passengers' satisfaction rate, order answering rate and order finishing rate, etc.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2202.10574

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sequential Knockoffs for Variable Selection in Reinforcement Learning

Ma, Tao, Cai, Hengrui, Qi, Zhengling, Shi, Chengchun, Laber, Eric B.

arXiv.org Artificial IntelligenceMar-24-2023

Interest in reinforcement learning (RL, Sutton & Barto 2018) has increased dramatically in recent years due in part to a number of high-profile successes in games (Mnih et al. 2013, 2015), autonomous driving (Sallab et al. 2017), and precision medicine (Tsiatis et al. 2019). However, despite theoretical and computational advances, real-world application of RL remains difficult. A primary challenge is dealing with high-dimensional state representations. Such representations occur naturally in systems with high-dimensional measurements, like images or audio, but can also occur when the system state is constructed by concatenating a series of measurements over a contiguous block of time. A high-dimensional state-- when a more parsimonious one would suffice--dilutes the efficiency of learning algorithms and makes the estimated optimal policy harder to interpret. Thus, methods for removing uninformative or redundant variables from the state are of tremendous practical value. We develop a general variable selection algorithm for offline RL, which aims to learn an optimal policy using only logged data, i.e., without any additional online interaction. Our contributions can be summarized as follows: (i) we formally define a minimal sufficient state for an MDP and argue that it is an appropriate target by which to design and evaluate variable selection methods in RL; (ii) we show that naïve variable selection methods based on the state or reward alone need not recover the minimal sufficient state; (iii) we propose a novel sequential knockoffs (SEEK) algorithm that applies with general black-box learning methods, and, under a β-mixing condition, consistently recovers the minimal sufficient state, and controls the false discovery rate (FDR, the ratio of the number of selected irrelevant variables to the number of selected variables); and (iv) we develop a novel algorithm to estimate the β-mixing coefficients of an MDP. The algorithm in (iv) is important in its own right as it applies to a number of applications beyond RL (McDonald et al. 2015).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2303.14281

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach

Zhou, Yunzhe, Qi, Zhengling, Shi, Chengchun, Li, Lexin

arXiv.org Artificial IntelligenceFeb-21-2023

In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting. When the coverage condition does not hold, which is common for offline data, the existing solutions would produce sub-optimal policies. The pessimism principle addresses this issue by discouraging recommendation of actions that are less explored conditioning on the state. However, nearly all pessimism-based methods rely on a key hyper-parameter that quantifies the degree of pessimism, and the performance of the methods can be highly sensitive to the choice of this parameter. We propose to integrate the pessimism principle with Thompson sampling and Bayesian machine learning for optimizing the degree of pessimism. We derive a credible set whose boundary uniformly lower bounds the optimal Q-function, and thus we do not require additional tuning of the degree of pessimism. We develop a general Bayesian learning method that works with a range of models, from Bayesian linear basis model to Bayesian neural network model. We develop the computational algorithm based on variational inference, which is highly efficient and scalable. We establish the theoretical guarantees of the proposed method, and show empirically that it outperforms the existing state-of-the-art solutions through both simulations and a real data example.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2210.1442

Country: Europe > Spain (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback