AITopics | Shi, Chengchun

Plotting

Shi, Chengchun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conformal Off-policy Prediction

Zhang, Yingying, Shi, Chengchun, Luo, Shikai

arXiv.org Artificial IntelligenceFeb-9-2023

Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms.

artificial intelligence, cal, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2206.06711

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Xu, Yang, Zhu, Jin, Shi, Chengchun, Luo, Shikai, Song, Rui

arXiv.org Artificial IntelligenceFeb-2-2023

Offline policy evaluation (OPE) estimates the discounted cumulative reward following a given target policy with an offline dataset collected from another (possibly unknown) behavior policy. OPE is important in situations where it is impractical or too costly to directly evaluate the target policy via online experimentation, including robotics (Quillen et al., 2018), precision medicine (Murphy, 2003; Kosorok and Laber, 2019; Tsiatis et al., 2019), economics, quantitative social science (Abadie and Cattaneo, 2018), recommendation systems (Li et al., 2010; Kiyohara et al., 2022), etc. Despite a large body of literature on OPE (see Section 2 for detailed discussions), many of them rely on the assumption of no unmeasured confounders (NUC), excluding the existence of unobserved variables that could potentially confound either the action-reward or action-next-state pair. This assumption, however, can be violated in some real-world applications such as healthcare and technological industries. Our paper is partly motivated by the need to evaluate the long-term treatment effects of certain app download ads from a short-video platform.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2212.14468

Genre: Research Report (0.82)

Industry: Marketing (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

Shi, Chengchun, Qi, Zhengling, Wang, Jianing, Zhou, Fan

arXiv.org Artificial IntelligenceJan-5-2023

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing literature are developed in \textit{online} settings where the data are easy to collect or simulate. Motivated by high stake domains such as mobile health studies with limited and pre-collected data, in this paper, we study \textit{offline} reinforcement learning methods. To efficiently use these datasets for policy optimization, we propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms. Specifically, when the initial policy is not consistent, our method will output a policy whose value is no worse and often better than that of the initial policy. When the initial policy is consistent, under some mild conditions, our method will yield a policy whose value converges to the optimal one at a faster rate than the initial policy, achieving the desired ``value enhancement" property. The proposed method is generally applicable to any parametrized policy that belongs to certain pre-specified function class (e.g., deep neural networks). Extensive numerical studies are conducted to demonstrate the superior performance of our method.

artificial intelligence, machine learning, reinforcement learning, (2 more...)

arXiv.org Artificial Intelligence

2301.0222

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Spectral Q-learning with Application to Mobile Health

Gao, Yuhe, Shi, Chengchun, Song, Rui

arXiv.org Artificial IntelligenceJan-2-2023

Precision medicine focuses on providing personalized treatment to patients by taking their personal information into consideration (see e.g., Kosorok and Laber, 2019; Tsiatis et al., 2019). It has found various applications in numerous studies, ranging from the cardiovascular disease study to cancer treatment and gene therapy (Jameson and Longo, 2015). A dynamic treatment regime (DTR) consists of a sequence of treatment decisions rules tailored to each individual patient's status at each time, mathematically formulating the idea behind precision medicine. One of the major objectives in precision medicine is to identify the optimal dynamic treatment regime that yields the most favorable outcome on average. With the rapidly development of mobile health (mHealth) technology, it becomes feasible to collect rich longitudinal data through mobile apps in medical studies.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2301.00927

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

Xu, Yang, Shi, Chengchun, Luo, Shikai, Wang, Lan, Song, Rui

arXiv.org Artificial IntelligenceDec-29-2022

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.

artificial intelligence, deep conditional generative learning, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2212.14466

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Review of Off-Policy Evaluation in Reinforcement Learning

Uehara, Masatoshi, Shi, Chengchun, Kallus, Nathan

arXiv.org Artificial IntelligenceDec-12-2022

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2212.06355

Country:

North America > United States > New York (0.28)
Europe > United Kingdom > England (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.92)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Jump Interval-Learning for Individualized Decision Making

Cai, Hengrui, Shi, Chengchun, Song, Rui, Lu, Wenbin

arXiv.org Machine LearningNov-16-2021

An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this paper, we focus on the continuous treatment setting and propose a jump interval-learning to develop an individualized interval-valued decision rule (I2DR) that maximizes the expected outcome. Unlike IDRs that recommend a single treatment, the proposed I2DR yields an interval of treatment options for each individual, making it more flexible to implement in practice. To derive an optimal I2DR, our jump interval-learning method estimates the conditional mean of the outcome given the treatment and the covariates via jump penalized regression, and derives the corresponding optimal I2DR based on the estimated outcome regression function. The regressor is allowed to be either linear for clear interpretation or deep neural network to model complex treatment-covariates interactions. To implement jump interval-learning, we develop a searching algorithm based on dynamic programming that efficiently computes the outcome regression function. Statistical properties of the resulting I2DR are established when the outcome regression function is either a piecewise or continuous function over the treatment space. We further develop a procedure to infer the mean outcome under the (estimated) optimal policy. Extensive simulations and a real data application to a warfarin study are conducted to demonstrate the empirical validity of the proposed I2DR.

machine learning, teaching medhods, teaching method, (24 more...)

arXiv.org Machine Learning

2111.08885

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.45)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

Shi, Chengchun, Uehara, Masatoshi, Jiang, Nan

arXiv.org Machine LearningNov-12-2021

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. As such, these methods suffer from either a large bias in the presence of unmeasured confounders, or a large variance in settings with continuous or large observation/state spaces. In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution. In fully-observable MDPs, these bridge functions reduce to the familiar value functions and marginal density ratios between the evaluation and the behavior policies. We next propose minimax estimation methods for learning these bridge functions. Our proposal permits general function approximation and is thus applicable to settings with continuous or large observation/state spaces. Finally, we construct three estimators based on these estimated bridge functions, corresponding to a value function-based estimator, a marginalized importance sampling estimator, and a doubly-robust estimator. Their nonasymptotic and asymptotic properties are investigated in detail.

artificial intelligence, bridge function, machine learning, (16 more...)

arXiv.org Machine Learning

2111.06784

Country:

North America > United States > Illinois (0.14)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Testing Directed Acyclic Graph via Structural, Supervised and Generative Adversarial Learning

Shi, Chengchun, Zhou, Yunzhe, Li, Lexin

arXiv.org Machine LearningJun-2-2021

In this article, we propose a new hypothesis testing method for directed acyclic graph (DAG). While there is a rich class of DAG estimation methods, there is a relative paucity of DAG inference solutions. Moreover, the existing methods often impose some specific model structures such as linear models or additive models, and assume independent data observations. Our proposed test instead allows the associations among the random variables to be nonlinear and the data to be time-dependent. We build the test based on some highly flexible neural networks learners. We establish the asymptotic guarantees of the test, while allowing either the number of subjects or the number of time points for each subject to diverge to infinity. We demonstrate the efficacy of the test through simulations and a brain connectivity network analysis.

deep learning, neural network, neurology, (19 more...)

arXiv.org Machine Learning

2106.01474

Country: North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deeply-Debiased Off-Policy Interval Estimation

Shi, Chengchun, Wan, Runzhe, Chernozhukov, Victor, Song, Rui

arXiv.org Artificial IntelligenceMay-10-2021

Reinforcement learning (RL, Sutton & Barto, 2018) is a general technique in sequential decision making that learns an optimal policy to maximize the average cumulative reward. Prior to adopting any policy in practice, it is crucial to know the impact of implementing such a policy. In many real domains such as healthcare (Murphy et al., 2001; Luedtke & van der Laan, 2017; Shi et al., 2020a), robotics (Andrychowicz et al., 2020) and autonomous driving (Sallab et al., 2017), it is costly, risky, unethical, or even infeasible to evaluate a policy's impact by directly running this policy. This motivates us to study the off-policy evaluation (OPE) problem that learns a target policy's value with pre-collected data generated by a different behavior policy. In many applications (e.g., mobile health studies), the number of observations is limited. Take the OhioT1DM dataset (Marling & Bunescu, 2018) as an example, only a few thousands observations are available (Shi et al., 2020b). In these cases, in addition to a point estimate on a target policy's value, it is crucial to construct a confidence interval (CI) that quantifies the uncertainty of the value estimates. This paper is concerned with the following question: is it possible to develop a robust and efficient off-policy value estimator, and provide rigorous uncertainty quantification under practically feasible conditions? We will give an affirmative answer to this question.

diabetes, estimator, health & medicine, (18 more...)

arXiv.org Artificial Intelligence

2105.04646

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback