to

### f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models

A density ratio is defined by the ratio of two probability densities. We study the inference problem of density ratios and apply a semi-parametric density-ratio estimator to the two-sample homogeneity test. In the proposed test procedure, the f-divergence between two probability densities is estimated using a density-ratio estimator. The f-divergence estimator is then exploited for the two-sample homogeneity test. We derive the optimal estimator of f-divergence in the sense of the asymptotic variance, and then investigate the relation between the proposed test procedure and the existing score test based on empirical likelihood estimator. Through numerical studies, we illustrate the adequacy of the asymptotic theory for finite-sample inference.

### Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation

Estimation of density derivatives is a versatile tool in statistical data analysis. A naive approach is to first estimate the density and then compute its derivative. However, such a two-step approach does not work well because a good density estimator does not necessarily mean a good density-derivative estimator. In this paper, we give a direct method to approximate the density derivative without estimating the density itself. Our proposed estimator allows analytic and computationally efficient approximation of multi-dimensional high-order density derivatives, with the ability that all hyper-parameters can be chosen objectively by cross-validation. We further show that the proposed density-derivative estimator is useful in improving the accuracy of non-parametric KL-divergence estimation via metric learning. The practical superiority of the proposed method is experimentally demonstrated in change detection and feature selection.

### Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

We propose a direct estimation method for R\'{e}nyi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets $X$ and $Y$, respectively with $N$ and $M$ samples, where $\eta:=M/N$ is a constant value. Considering the $k$-nearest neighbor ($k$-NN) graph of $Y$ in the joint data set $(X,Y)$, we show that the average powered ratio of the number of $X$ points to the number of $Y$ points among all $k$-NN points is proportional to R\'{e}nyi divergence of $X$ and $Y$ densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of $\gamma$-H\"{o}lder smooth functions, the estimator achieves the MSE rate of $O(N^{-2\gamma/(\gamma+d)})$. Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order $d$, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of $O(1/N)$. Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.

### Relative Density-Ratio Estimation for Robust Distribution Comparison

Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test. However, since density-ratio functions often possess high fluctuation, divergence estimation is still a challenging task in practice. In this paper, we propose to use relative divergences for distribution comparison, which involves approximation of relative density-ratios. Since relative density-ratios are always smoother than corresponding ordinary density-ratios, our proposed method is favorable in terms of the non-parametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposed approach.

### Green's function based unparameterised multi-dimensional kernel density and likelihood ratio estimator

This paper introduces a probability density estimator based on Green's function identities. A density model is constructed under the sole assumption that the probability density is differentiable. The method is implemented as a binary likelihood estimator for classification purposes, so issues such as mis-modeling and overtraining are also discussed. The identity behind the density estimator can be interpreted as a real-valued, non-scalar kernel method which is able to reconstruct differentiable density functions.