performance metric
Regret Bounds for Non-decomposable Metrics with Missing Labels
We consider the problem of recommending relevant labels (items) for a given data point (user). In particular, we are interested in the practically important setting where the evaluation is with respect to non-decomposable (over labels) performance metrics like the $F_1$ measure, \emph{and} training data has missing labels. To this end, we propose a generic framework that given a performance metric $\Psi$, can devise a regularized objective function and a threshold such that all the values in the predicted score vector above and only above the threshold are selected to be positive. We show that the regret or generalization error in the given metric $\Psi$ is bounded ultimately by estimation error of certain underlying parameters. In particular, we derive regret bounds under three popular settings: a) collaborative filtering, b) multilabel classification, and c) PU (positive-unlabeled) learning. For each of the above problems, we can obtain precise non-asymptotic regret bound which is small even when a large fraction of labels is missing. Our empirical results on synthetic and benchmark datasets demonstrate that by explicitly modeling for missing labels and optimizing the desired performance metric, our algorithm indeed achieves significantly better performance (like $F_1$ score) when compared to methods that do not model missing label information carefully.
e197fe307eb3467035f892dc100d570a-Supplemental-Conference.pdf
The process for calculating these metrics is described in Appendix C. Moreover, to ensure the comparability between prediction performance metrics and driving performance metrics in the radar plot, we normalize all metrics to the scale of [0, 1]. In the subsequent section, we provide an overview of the DESPOT planner. These two values can only be inferred from history. The safety is represented by the normalized collision rate.
A Proof of Proposition 1 Proof: First, it is straightforward to show that the IPW estimator of the ground truth treatment effect ˆ δ
We proceed to compute the variances of each estimator. The proof also holds for the non-zero mean case trivially. Causal model details for Section 5.2 In Section 5.2, We include a wide range of machine learning-based causal inference methods to evaluate the performance of causal error estimators. Others configs are kept as default. The others are kept as default.
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Wisconsin (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Michigan (0.04)
- (3 more...)
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States > Illinois (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > Texas (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- North America > United States > Illinois (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)