Goto

Collaborating Authors

 Yasui, Shota


Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data

arXiv.org Artificial Intelligence

We address the issue of binary classification from positive and unlabeled data (PU classification) with a selection bias in the positive data. During the observation process, (i) a sample is exposed to a user, (ii) the user then returns the label for the exposed sample, and (iii) we however can only observe the positive samples. Therefore, the positive labels that we observe are a combination of both the exposure and the labeling, which creates a selection bias problem for the observed positive samples. This scenario represents a conceptual framework for many practical applications, such as recommender systems, which we refer to as ``learning from positive, unlabeled, and exposure data'' (PUE classification). To tackle this problem, we initially assume access to data with exposure labels. Then, we propose a method to identify the function of interest using a strong ignorability assumption and develop an ``Automatic Debiased PUE'' (ADPUE) learning method. This algorithm directly debiases the selection bias without requiring intermediate estimates, such as the propensity score, which is necessary for other learning methods. Through experiments, we demonstrate that our approach outperforms traditional PU learning methods on various semi-synthetic datasets.


Learning Causal Relationships from Conditional Moment Conditions by Importance Weighting

arXiv.org Machine Learning

We consider learning causal relationships under conditional moment conditions. Unlike causal inference under unconditional moment conditions, conditional moment conditions pose serious challenges for causal inference, especially in complex, high-dimensional settings. To address this issue, we propose a method that transforms conditional moment conditions to unconditional moment conditions through importance weighting using the conditional density ratio. Then, using this transformation, we propose a method that successfully approximates conditional moment conditions. Our proposed approach allows us to employ methods for estimating causal parameters from unconditional moment conditions, such as generalized method of moments, adequately in a straightforward manner. In experiments, we confirm that our proposed method performs well compared to existing methods.


A Practical Guide of Off-Policy Evaluation for Bandit Problems

arXiv.org Machine Learning

Off-policy evaluation (OPE) is the problem of estimating the value of a target policy from samples obtained via different policies. Recently, applying OPE methods for bandit problems has garnered attention. For the theoretical guarantees of an estimator of the policy value, the OPE methods require various conditions on the target policy and policy used for generating the samples. However, existing studies did not carefully discuss the practical situation where such conditions hold, and the gap between them remains. This paper aims to show new results for bridging the gap. Based on the properties of the evaluation policy, we categorize OPE situations. Then, among practical applications, we mainly discuss the best policy selection. For the situation, we propose a meta-algorithm based on existing OPE estimators. We investigate the proposed concepts using synthetic and open real-world datasets in experiments.


Learning Classifiers under Delayed Feedback with a Time Window Assumption

arXiv.org Machine Learning

We consider training a binary classifier under delayed feedback (DF Learning). In DF Learning, we first receive negative samples; subsequently, some samples turn positive. This problem is conceivable in various real-world applications such as online advertisements, where the user action takes place long after the first click. Owing to the delayed feedback, simply separating the positive and negative data causes a sample selection bias. One solution is to assume that a long time window after first observing a sample reduces the sample selection bias. However, existing studies report that only using a portion of all samples based on the time window assumption yields suboptimal performance, and the use of all samples along with the time window assumption improves empirical performance. Extending these existing studies, we propose a method with an unbiased and convex empirical risk constructed from the whole samples under the time window assumption. We provide experimental results to demonstrate the effectiveness of the proposed method using a real traffic log dataset.


Dual Learning Algorithm for Delayed Feedback in Display Advertising

arXiv.org Machine Learning

In display advertising, predicting the conversion rate, that is, the probability that a user takes a predefined action on an advertiser's website is fundamental in estimating the value of showing a user an advertisement. There are two troublesome difficulties in the conversion rate prediction due to the delayed feedback. First, some positive labels are not correctly observed in training data, because some conversions do not occur right after clicking the ads. Moreover, the delay mechanism is not uniform among instances; some positive feedback is much more frequently observed than the others. It is widely acknowledged that these problems cause a severe bias in the naive empirical average loss function for the conversion rate prediction. To overcome the challenges, we propose two unbiased estimators, one for the conversion rate prediction, and the other for the bias estimation. Subsequently, we propose an interactive learning algorithm named {\em Dual Learning Algorithm for Delayed Feedback (DLA-DF)} where a conversion rate predictor and a bias estimator are learned alternately. The proposed algorithm is the first of its kind to address the two major challenges in a theoretically principal way. Lastly, we conducted a simulation experiment to demonstrate that the proposed method outperforms the existing baselines and validate that the unbiased estimation approach is suitable for the delayed feedback problem.


Counterfactual Cross-Validation: Effective Causal Model Selection from Observational Data

arXiv.org Machine Learning

What is the most effective way to select the best causal model among potential candidates? In this paper, we propose a method to effectively select the best individual-level treatment effect (ITE) predictors from a set of candidates using only an observational validation set. In model selection or hyperparameter tuning, we are interested in choosing the best model or the value of hyperparameter from potential candidates. Thus, we focus on accurately preserving the rank order of the ITE prediction performance of candidate causal models. The proposed evaluation metric is theoretically proved to preserve the true ranking of the model performance in expectation and to minimize the upper bound of the finite sample uncertainty in model selection. Consistent with the theoretical result, empirical experiments demonstrate that our proposed method is more likely to select the best model and set of hyperparameter in both model selection and hyperparameter tuning.


A Contextual Bandit Algorithm for Ad Creative under Ad Fatigue

arXiv.org Machine Learning

Selecting ad creative is one of the most important task for DSPs (Demand-Side Platform) in online advertising. DSPs should not only consider the effectiveness of the ad creative but also the user's psychological status when selecting ad creative. In this study, we propose an efficient and easy-to-implement ad creative selection algorithm that explicitly considers wear-in and wear-out effects of ad creative due to the repetitive ad exposures. The proposed system was deployed in a real-world production environment and tested against the baseline. It out-performed the existing system in most of the KPIs.


Efficient Counterfactual Learning from Bandit Feedback

arXiv.org Machine Learning

What is the most statistically efficient way to do off-policy evaluation and optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We also apply our estimators to improve online advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark.