Goto

Collaborating Authors

 faster online learning


Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization

Neural Information Processing Systems

In this paper, we consider online F-measure optimization (OFO). Unlike traditional performance metrics (e.g., classification error rate), F-measure is non-decomposable over training examples and is a non-convex function of model parameters, making it much more difficult to be optimized in an online fashion. Most existing results of OFO usually suffer from high memory/computational costs and/or lack statistical consistency guarantee for optimizing F-measure at the population level. To advance OFO, we propose an efficient online algorithm based on simultaneously learning a posterior probability of class and learning an optimal threshold by minimizing a stochastic strongly convex function with unknown strong convexity parameter. A key component of the proposed method is a novel stochastic algorithm with low memory and computational costs, which can enjoy a convergence rate of $\widetilde O(1/\sqrt{n})$ for learning the optimal threshold under a mild condition on the convergence of the posterior probability, where $n$ is the number of processed examples. It is provably faster than its predecessor based on a heuristic for updating the threshold. The experiments verify the efficiency of the proposed algorithm in comparison with state-of-the-art OFO algorithms.


Reviews: Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization

Neural Information Processing Systems

A) My main concern with this paper is with respect to the main results (Theorems 2 and 3). It seems the authors have not put sufficient care to the fact that \partial \hat{Q} in Algorithm 2 is a biased estimator of the true gradient \partial Q. Also, \hat{Q} defined in Line 189 depends on \hat{\pi} which is an estimate of \pi. Thus, a probabilistic proof would require to look at a conditional probability of the estimation of Q depending on the estimation of \pi. B) Regardless of the above, the final high probability statement in Theorems 2 and 3, seem to be missing the union bound of the error probability in Assumption 1.


Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization

Zhang, Xiaoxuan, Liu, Mingrui, Zhou, Xun, Yang, Tianbao

Neural Information Processing Systems

In this paper, we consider online F-measure optimization (OFO). Unlike traditional performance metrics (e.g., classification error rate), F-measure is non-decomposable over training examples and is a non-convex function of model parameters, making it much more difficult to be optimized in an online fashion. Most existing results of OFO usually suffer from high memory/computational costs and/or lack statistical consistency guarantee for optimizing F-measure at the population level. To advance OFO, we propose an efficient online algorithm based on simultaneously learning a posterior probability of class and learning an optimal threshold by minimizing a stochastic strongly convex function with unknown strong convexity parameter. A key component of the proposed method is a novel stochastic algorithm with low memory and computational costs, which can enjoy a convergence rate of $\widetilde O(1/\sqrt{n})$ for learning the optimal threshold under a mild condition on the convergence of the posterior probability, where $n$ is the number of processed examples.