AITopics

2504.08215

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Education (0.45)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceMar-18-2024

Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

Wang, Danyang, Shi, Chengchun, Luo, Shikai, Sun, Will Wei

In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achieving high-quality policy learning. However, most existing offline reinforcement learning (RL) methods depend on two key assumptions--unconfoundedness and positivity--which frequently do not hold in observational data contexts. Recognizing these challenges, we propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on front-door criterion to remove the confounding bias; additionally, we adopt the pessimistic principle to address the distributional shift between the action distributions induced by candidate policies, and the behavior policy that generates the observational data. Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2403.11841

Country:

North America > United States (0.14)
Europe > Portugal (0.14)

Genre: Research Report > Experimental Study (0.54)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
(3 more...)

arXiv.org Machine LearningDec-3-2023

Policy Evaluation for Temporal and/or Spatial Dependent Experiments

Luo, Shikai, Yang, Ying, Shi, Chengchun, Yao, Fang, Ye, Jieping, Zhu, Hongtu

The aim of this paper is to establish a causal link between the policies implemented by technology companies and the outcomes they yield within intricate temporal and/or spatial dependent experiments. We propose a novel temporal/spatio-temporal Varying Coefficient Decision Process (VCDP) model, capable of effectively capturing the evolving treatment effects in situations characterized by temporal and/or spatial dependence. Our methodology encompasses the decomposition of the Average Treatment Effect (ATE) into the Direct Effect (DE) and the Indirect Effect (IE). We subsequently devise comprehensive procedures for estimating and making inferences about both DE and IE. Additionally, we provide a rigorous analysis of the statistical properties of these procedures, such as asymptotic power. To substantiate the effectiveness of our approach, we carry out extensive simulations and real data analyses.

artificial intelligence, estimator, machine learning, (16 more...)

2202.10887

Country: North America > United States > North Carolina (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Epidemiology (0.92)
Transportation > Ground > Road (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(2 more...)

arXiv.org Machine LearningOct-28-2023

Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards

Zhu, Jin, Wan, Runzhe, Qi, Zhengling, Luo, Shikai, Shi, Chengchun

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation (OPE) and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2310.18715

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMar-26-2023

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Shi, Chengchun, Wan, Runzhe, Song, Ge, Luo, Shikai, Song, Rui, Zhu, Hongtu

This paper concerns the applications in the two-sided markets that involve a group of subjects who are making sequential decisions across time and/or location. In particular, we consider large-scale fleet management in ride-sharing companies, such as Uber, Lyft and Didi. These companies form a typical two-sided market that enables efficient interactions between passengers and drivers (Armstrong, 2006; Rysman, 2009). With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings (Frenken and Schor, 2017; Jin et al., 2018; Hagiu and Wright, 2019). With rich information on passenger demand and locations of taxi drivers, they significantly reduce taxi cruise time and passenger waiting time in comparison to traditional taxi systems (Li et al., 2011; Zhang et al., 2014; Miao et al., 2016). We use the numbers of drivers and call orders to measure the supply and demand at a given time and location. Both supply and demand are spatio-temporal processes and they interact with each other. These processes depend strongly on the platform's policies, and have a huge impact on the platform's outcomes of interest, such as drivers' income level and working time, passengers' satisfaction rate, order answering rate and order finishing rate, etc.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2202.10574

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceFeb-9-2023

Conformal Off-policy Prediction

Zhang, Yingying, Shi, Chengchun, Luo, Shikai

Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms.

artificial intelligence, cal, machine learning, (15 more...)

2206.06711

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceFeb-2-2023

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Xu, Yang, Zhu, Jin, Shi, Chengchun, Luo, Shikai, Song, Rui

Offline policy evaluation (OPE) estimates the discounted cumulative reward following a given target policy with an offline dataset collected from another (possibly unknown) behavior policy. OPE is important in situations where it is impractical or too costly to directly evaluate the target policy via online experimentation, including robotics (Quillen et al., 2018), precision medicine (Murphy, 2003; Kosorok and Laber, 2019; Tsiatis et al., 2019), economics, quantitative social science (Abadie and Cattaneo, 2018), recommendation systems (Li et al., 2010; Kiyohara et al., 2022), etc. Despite a large body of literature on OPE (see Section 2 for detailed discussions), many of them rely on the assumption of no unmeasured confounders (NUC), excluding the existence of unobserved variables that could potentially confound either the action-reward or action-next-state pair. This assumption, however, can be violated in some real-world applications such as healthcare and technological industries. Our paper is partly motivated by the need to evaluate the long-term treatment effects of certain app download ads from a short-video platform.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2212.14468

Genre: Research Report (0.82)

Industry: Marketing (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

arXiv.org Artificial IntelligenceDec-29-2022

Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

Xu, Yang, Shi, Chengchun, Luo, Shikai, Wang, Lan, Song, Rui

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.

artificial intelligence, deep conditional generative learning, machine learning, (1 more...)

2212.14466

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJul-29-2014

Sure Screening for Gaussian Graphical Models

Luo, Shikai, Song, Rui, Witten, Daniela

We propose {graphical sure screening}, or GRASS, a very simple and computationally-efficient screening procedure for recovering the structure of a Gaussian graphical model in the high-dimensional setting. The GRASS estimate of the conditional dependence graph is obtained by thresholding the elements of the sample covariance matrix. The proposed approach possesses the sure screening property: with very high probability, the GRASS estimated edge set contains the true edge set. Furthermore, with high probability, the size of the estimated edge set is controlled. We provide a choice of threshold for GRASS that can control the expected false positive rate. We illustrate the performance of GRASS in a simulation study and on a gene expression data set, and show that in practice it performs quite competitively with more complex and computationally-demanding techniques for graph estimation.

artificial intelligence, graphical lasso, health & medicine, (17 more...)

1407.7819

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.85)