Goto

Collaborating Authors

 kliep


Wen

AAAI Conferences

Covariate shift is a fundamental problem for learning in non-stationary environments where the conditional distribution p(y x) is the same between training and test data while their marginal distributions ptr(x) and pte(x) are different. Although many covariate shift correction techniques remain effective for real world problems, most do not scale well in practice. In this paper, using inspiration from recent optimization techniques, we apply the Frank-Wolfe algorithm to two well-known covariate shift correction techniques, Kernel Mean Matching (KMM) and Kullback-Leibler Importance Estimation Procedure (KLIEP), and identify an important connection between kernel herding and KMM. Our complexity analysis shows the benefits of the Frank-Wolfe approach over projected gradient methods in solving KMM and KLIEP. An empirical study then demonstrates the effectiveness and efficiency of the Frank-Wolfe algorithm for correcting covariate shift in practice.


Nearest Neighbor-based Importance Weighting

arXiv.org Machine Learning

Importance weighting is widely applicable in machine learning in general and in techniques dealing with data covariate shift problems in particular. A novel, direct approach to determine such importance weighting is presented. It relies on a nearest neighbor classification scheme and is relatively straightforward to implement. Comparative experiments on various classification tasks demonstrate the effectiveness of our so-called nearest neighbor weighting (NNeW) scheme. Considering its performance, our procedure can act as a simple and effective baseline method for importance weighting.


Cost Sensitive Learning in the Presence of Symmetric Label Noise

arXiv.org Machine Learning

In binary classification framework, we are interested in making cost sensitive label predictions in the presence of uniform/symmetric label noise. We first observe that $0$-$1$ Bayes classifiers are not (uniform) noise robust in cost sensitive setting. To circumvent this impossibility result, we present two schemes; unlike the existing methods, our schemes do not require noise rate. The first one uses $\alpha$-weighted $\gamma$-uneven margin squared loss function, $l_{\alpha, usq}$, which can handle cost sensitivity arising due to domain requirement (using user given $\alpha$) or class imbalance (by tuning $\gamma$) or both. However, we observe that $l_{\alpha, usq}$ Bayes classifiers are also not cost sensitive and noise robust. We show that regularized ERM of this loss function over the class of linear classifiers yields a cost sensitive uniform noise robust classifier as a solution of a system of linear equations. We also provide a performance bound for this classifier. The second scheme that we propose is a re-sampling based scheme that exploits the special structure of the uniform noise models and uses in-class probability estimates. Our computational experiments on some UCI datasets with class imbalance show that classifiers of our two schemes are on par with the existing methods and in fact better in some cases w.r.t. Accuracy and Arithmetic Mean, without using/tuning noise rate. We also consider other cost sensitive performance measures viz., F measure and Weighted Cost for evaluation.


Interpreting Outliers: Localized Logistic Regression for Density Ratio Estimation

arXiv.org Machine Learning

We propose an inlier-based outlier detection method capable of both identifying the outliers and explaining why they are outliers, by identifying the outlier-specific features. Specifically, we employ an inlier-based outlier detection criterion, which uses the ratio of inlier and test probability densities as a measure of plausibility of being an outlier. For estimating the density ratio function, we propose a localized logistic regression algorithm. Thanks to the locality of the model, variable selection can be outlier-specific, and will help interpret why points are outliers in a high-dimensional space. Through synthetic experiments, we show that the proposed algorithm can successfully detect the important features for outliers. Moreover, we show that the proposed algorithm tends to outperform existing algorithms in benchmark datasets.


Learning Sparse Structural Changes in High-dimensional Markov Networks: A Review on Methodologies and Theories

arXiv.org Machine Learning

For example, genes may regulate each other in different ways when external conditions are changed; the number of daily flu-like symptom reports in nearby hospitals may become correlated when a major epidemic disease breaks out; EEG signals from different regions of the brain may be synchronized/desynchronized when the subject is performing different activities. Spotting such changes in interactions may provide key insights into the underlying system. The interactions among random variables can be formulated as undirected probabilistic graphical models, or Markov Networks (MNs) [Koller and Friedman, 2009], expressing the interactions via the conditional independence. We consider a simple model: the pairwise MNs where the links are only encoded for single or pairs of random variables. Due to the Hammersley-Clifford theorem [Hammersley and Clifford, 1971], the underlying joint probability density function can be represented as the product of univariate and bivariate factors.


Support Consistency of Direct Sparse-Change Learning in Markov Networks

arXiv.org Machine Learning

We study the problem of learning sparse structure changes between two Markov networks $P$ and $Q$. Rather than fitting two Markov networks separately to two sets of data and figuring out their differences, a recent work proposed to learn changes \emph{directly} via estimating the ratio between two Markov network models. In this paper, we give sufficient conditions for \emph{successful change detection} with respect to the sample size $n_p, n_q$, the dimension of data $m$, and the number of changed edges $d$. When using an unbounded density ratio model we prove that the true sparse changes can be consistently identified for $n_p = \Omega(d^2 \log \frac{m^2+m}{2})$ and $n_q = \Omega({n_p^2})$, with an exponentially decaying upper-bound on learning error. Such sample complexity can be improved to $\min(n_p, n_q) = \Omega(d^2 \log \frac{m^2+m}{2})$ when the boundedness of the density ratio model is assumed. Our theoretical guarantee can be applied to a wide range of discrete/continuous Markov networks.


Correcting Covariate Shift with the Frank-Wolfe Algorithm

AAAI Conferences

Covariate shift is a fundamental problem for learning in non-stationary environments where the conditional distribution p(y|x) is the same between training and test data while their marginal distributions p tr (x) and p te (x) are different. Although many covariate shift correction techniques remain effective for real world problems, most do not scale well in practice. In this paper, using inspiration from recent optimization techniques, we apply the Frank-Wolfe algorithm to two well-known covariate shift correction techniques, Kernel Mean Matching (KMM) and Kullback-Leibler Importance Estimation Procedure (KLIEP), and identify an important connection between kernel herding and KMM. Our complexity analysis shows the benefits of the Frank-Wolfe approach over projected gradient methods in solving KMM and KLIEP. An empirical study then demonstrates the effectiveness and efficiency of the Frank-Wolfe algorithm for correcting covariate shift in practice.


Support Consistency of Direct Sparse-Change Learning in Markov Networks

AAAI Conferences

We study the problem of learning sparse structure changes between two Markov networks P and Q. Rather than fitting two Markov networks separately to two sets of data and figuring out their differences, a recent work proposed to learn changes directly via estimating the ratio between two Markov network models.  Such a direct approach was demonstrated to perform excellently in experiments, although its theoretical properties remained unexplored.  In this paper, we give sufficient conditions for successful change detection with respect to the sample size np, nq, the dimension of data m, and the number of changed edges d.


Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation

arXiv.org Machine Learning

We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we \emph{directly} learn the network structure change by estimating the ratio of Markov network models. This density-ratio formulation naturally allows us to introduce sparsity in the network structure change, which highly contributes to enhancing interpretability. Furthermore, computation of the normalization term, which is a critical bottleneck of the naive approach, can be remarkably mitigated. We also give the dual formulation of the optimization problem, which further reduces the computation cost for large-scale Markov networks. Through experiments, we demonstrate the usefulness of our method.


Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

Neural Information Processing Systems

We address the problem of estimating the ratio of two probability density functions (a.k.a.~the importance). The importance values can be used for various succeeding tasks such as non-stationarity adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally very efficient and numerically stable. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bound. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches.