Goto

Collaborating Authors

 Performance Analysis


Learning structured densities via infinite dimensional exponential families

Neural Information Processing Systems

Learning the structure of a probabilistic graphical models is a well studied problem in the machine learning community due to its importance in many applications. Current approaches are mainly focused on learning the structure under restrictive parametric assumptions, which limits the applicability of these methods. In this paper, we study the problem of estimating the structure of a probabilistic graphical model without assuming a particular parametric model. We consider probabilities that are members of an infinite dimensional exponential family [4], which is parametrized by a reproducing kernel Hilbert space (RKHS) H and its kernel k . One difficulty in learning nonparametric densities is the evaluation of the normalizing constant. In order to avoid this issue, our procedure minimizes the penalized score matching objective [10, 11]. We show how to efficiently minimize the proposed objective using existing group lasso solvers. Furthermore, we prove that our procedure recovers the graph structure with high-probability under mild conditions. Simulation studies illustrate ability of our procedure to recover the true graph structure without the knowledge of the data generating process.


Regularization Path of Cross-Validation Error Lower Bounds

Neural Information Processing Systems

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds . The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. Our numerical experiments demonstrate that a theoretically guaranteed choice of a regularization parameter in the above sense is possible with reasonable computational costs.





Consistent Feature Selection for Analytic Deep Neural Networks

Neural Information Processing Systems

One of the most important steps toward model interpretability is feature selection, which aims to identify the subset of relevant features with respect to an outcome.


Analysis of Robust PCA via Local Incoherence

Neural Information Processing Systems

We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP). In contrast to previous studies that assume the support of the error matrix is generated by uniform Bernoulli sampling, we allow non-uniform sampling, i.e., entries of the low-rank matrix are corrupted by errors with unequal probabilities. We characterize conditions on error corruption of each individual entry based on the local incoherence of the low-rank matrix, under which correct matrix decomposition by PCP is guaranteed. Such a refined analysis of robust PCA captures how robust each entry of the low rank matrix combats error corruption. In order to deal with non-uniform error corruption, our technical proof introduces a new weighted norm and develops/exploits the concentration properties that such a norm satisfies.


I: Multi-modal Models Membership Inference Pingyi Hu

Neural Information Processing Systems

Those scores are then averaged over the whole corpus to reach an overall quality. The MSCOCO dataset is one of the most representative large-scale labeled image datasets available to the public. It is also the most authoritative and important benchmark in the current target recognition, detection and other fields. Its image data source is Y ahoo's photo album website, Flickr. Most of the images in the dataset display a human being involved in an activity.