Goto

Collaborating Authors

 Country


A Unified Framework for Random Forest Prediction Error Estimation

arXiv.org Machine Learning

We introduce a unified framework for random forest prediction err or estimation based on a novel estimator of the conditional prediction error distribution function. Our framework enables immediate estimation of key parameters often of interest, inc luding conditional mean squared prediction errors, conditional biases, and conditional qu antiles, by a straightforward plugin routine. Our approach is particularly well-adapted for p rediction interval estimation, which has received less attention in the random forest lit erature despite its practical utility; we show via simulations that our proposed predictio n intervals are competitive with, and in some settings outperform, existing methods. T o establish theoretical grounding for our framework, we prove pointwise uniform consiste ncy of a more stringent version of our estimator of the conditional prediction error distrib ution. In addition to providing a suite of measures of prediction uncertainty, our gener al framework is applicable to many variants of the random forest algorithm. The estimator s introduced here are implemented in the R package forestError .


Support Vector Machine Classifier via $L_{0/1}$ Soft-Margin Loss

arXiv.org Machine Learning

Support vector machine (SVM) has attracted great attentions for the last two decades due to its extensive applications, and thus numerous optimization models have been proposed. To distinguish all of them, in this paper, we introduce a new model equipped with an $L_{0/1}$ soft-margin loss (dubbed as $L_{0/1}$-SVM) which well captures the nature of the binary classification. Many of the existing convex/non-convex soft-margin losses can be viewed as a surrogate of the $L_{0/1}$ soft-margin loss. Despite the discrete nature of $L_{0/1}$, we manage to establish the existence of global minimizer of the new model as well as revealing the relationship among its minimizers and KKT/P-stationary points. These theoretical properties allow us to take advantage of the alternating direction method of multipliers. In addition, the $L_{0/1}$-support vector operator is introduced as a filter to prevent outliers from being support vectors during the training process. Hence, the method is expected to be relatively robust. Finally, numerical experiments demonstrate that our proposed method generates better performance in terms of much shorter computational time with much fewer number of support vectors when against with some other leading methods in areas of SVM. When the data size gets bigger, its advantage becomes more evident.


Learning Arbitrary Quantities of Interest from Expensive Black-Box Functions through Bayesian Sequential Optimal Design

arXiv.org Machine Learning

Estimating arbitrary quantities of interest (QoIs) that are non-linear operators of complex, expensive-to-evaluate, black-box functions is a challenging problem due to missing domain knowledge and finite budgets. Bayesian optimal design of experiments (BODE) is a family of methods that identify an optimal design of experiments (DOE) under different contexts, using only in a limited number of function evaluations. Under BODE methods, sequential design of experiments (SDOE) accomplishes this task by selecting an optimal sequence of experiments while using data-driven probabilistic surrogate models instead of the expensive black-box function. Probabilistic predictions from the surrogate model are used to define an information acquisition function (IAF) which quantifies the marginal value contributed or the expected information gained by a hypothetical experiment. The next experiment is selected by maximizing the IAF. A generally applicable IAF is the expected information gain (EIG) about a QoI as captured by the expectation of the Kullback-Leibler divergence between the predictive distribution of the QoI after doing a hypothetical experiment and the current predictive distribution about the same QoI. We model the underlying information source as a fully-Bayesian, non-stationary Gaussian process (FBNSGP), and derive an approximation of the information gain of a hypothetical experiment about an arbitrary QoI conditional on the hyper-parameters The EIG about the same QoI is estimated by sample averages to integrate over the posterior of the hyper-parameters and the potential experimental outcomes. We demonstrate the performance of our method in four numerical examples and a practical engineering problem of steel wire manufacturing. The method is compared to two classic SDOE methods: random sampling and uncertainty sampling.


VLSI Mask Optimization: From Shallow To Deep Learning

arXiv.org Machine Learning

Abstract-- VLSI mask optimization is one of the most critical stages in manufacturability aware design, which is costly due to the complicated mask optimization and lithography simulation. Recent researches have shown prominent advantages of machine learning techniques dealing with complicated and big data problems, which bring potential of dedicated machine learning solution for DFM problems and facilitate the VLSI design cycle. In this paper, we focus on a heterogeneous OPC framework that assists mask layout optimization. Preliminary results show the efficiency and effectiveness of proposed frameworks that have the potential to be alternatives to existing EDA solutions. I Introduction VLSI mask optimization is one of the most critical stages in manufacturability aware design, which is costly due to the complicated mask optimization and lithography simulation. Recent studies have shown prominent advantages of machine learning techniques dealing with complicated and big data problems, which bring the potential of dedicated machine learning solution for DFM problems and facilitate the VLSI design cycle [1, 2].


Latent Complete Row Space Recovery for Multi-view Subspace Clustering

arXiv.org Machine Learning

Multi-view subspace clustering has been applied to applications such as image processing and video surveillance, and has attracted increasing attention. Most existing methods learn view-specific self-representation matrices, and construct a combined affinity matrix from multiple views. The affinity construction process is time-consuming, and the combined affinity matrix is not guaranteed to reflect the whole true subspace structure. To overcome these issues, the Latent Complete Row Space Recovery (LCRSR) method is proposed. Concretely, LCRSR is based on the assumption that the multi-view observations are generated from an underlying latent representation, which is further assumed to collect the authentic samples drawn exactly from multiple subspaces. LCRSR is able to recover the row space of the latent representation, which not only carries complete information from multiple views but also determines the subspace membership under certain conditions. LCRSR does not involve the graph construction procedure and is solved with an efficient and convergent algorithm, thereby being more scalable to large-scale datasets. The effectiveness and efficiency of LCRSR are validated by clustering various kinds of multi-view data and illustrated in the background subtraction task.


More Data Can Hurt for Linear Regression: Sample-wise Double Descent

arXiv.org Machine Learning

In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples. In other words, more data actually hurts the estimator. This behavior is implicit in a recent line of theoretical works analyzing "double-descent" phenomenon in linear models. In this note, we isolate and understand this behavior in an extremely simple setting: linear regression with isotropic Gaussian covariates. In particular, this occurs due to an unconventional type of bias-variance tradeoff in the overparameterized regime: the bias decreases with more samples, but variance increases.


Fairness Assessment for Artificial Intelligence in Financial Industry

arXiv.org Machine Learning

Artificial Intelligence (AI) is an important driving force for the development and transformation of the financial industry. However, with the fast-evolving AI technology and application, unintentional bias, insufficient model validation, immature contingency plan and other underestimated threats may expose the company to operational and reputational risks. In this paper, we focus on fairness evaluation, one of the key components of AI Governance, through a quantitative lens. Statistical methods are reviewed for imbalanced data treatment and bias mitigation. These methods and fairness evaluation metrics are then applied to a credit card default payment example.


Deep Efficient End-to-end Reconstruction (DEER) Network for Low-dose Few-view Breast CT from Projection Data

arXiv.org Machine Learning

Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of calcification (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to x-ray radiation, dose reduction of breast CT is an important topic, and for this purpose low-dose few-view scanning is a main approach. In this article, we propose a Deep Efficient End-to-end Reconstruction (DEER) network for low-dose few-view breast CT. The major merits of our network include high dose efficiency, excellent image quality, and low model complexity. By the design, the proposed network can learn the reconstruction process in terms of as less as O(N) parameters, where N is the size of an image to be reconstructed, which represents orders of magnitude improvements relative to the state-of-the-art deep-learning based reconstruction methods that map projection data to tomographic images directly. As a result, our method does not require expensive GPUs to train and run. Also, validated on a cone-beam breast CT dataset prepared by Koning Corporation on a commercial scanner, our method demonstrates competitive performance over the state-of-the-art reconstruction networks in terms of image quality.


A Rigorous Theory of Conditional Mean Embeddings

arXiv.org Machine Learning

Conditional mean embeddings (CME) have proven themselves to be a powerful tool in many machine learning applications. They allow the efficient conditioning of probability distributions within the corresponding reproducing kernel Hilbert spaces (RKHSs) by providing a linear-algebraic relation for the kernel mean embeddings of the respective probability distributions. Both centered and uncentered covariance operators have been used to define CMEs in the existing literature. In this paper, we develop a mathematically rigorous theory for both variants, discuss the merits and problems of either, and significantly weaken the conditions for applicability of CMEs. In the course of this, we demonstrate a beautiful connection to Gaussian conditioning in Hilbert spaces.


Efficient adjustment sets for population average treatment effect estimation in non-parametric causal graphical models

arXiv.org Machine Learning

The method of covariate adjustment is often used for estimation of population average treatment effects in observational studies. Graphical rules for determining all valid covariate adjustment sets from an assumed causal graphical model are well known. Restricting attention to causal linear models, a recent article derived two novel graphical criteria: one to compare the asymptotic variance of linear regression treatment effect estimators that control for certain distinct adjustment sets and another to identify the optimal adjustment set that yields the least squares treatment effect estimator with the smallest asymptotic variance among consistent adjusted least squares estimators. In this paper we show that the same graphical criteria can be used in non-parametric causal graphical models when treatment effects are estimated by contrasts involving non-parametrically adjusted estimators of the interventional means. We also provide a graphical criterion for determining the optimal adjustment set among the minimal adjustment sets, which is valid for both linear and non-parametric estimators. We provide a new graphical criterion for comparing time dependent adjustment sets, that is, sets comprised by covariates that adjust for future treatments and that are themselves affected by earlier treatments. We show by example that uniformly optimal time dependent adjustment sets do not always exist. In addition, for point interventions, we provide a sound and complete graphical criterion for determining when a non-parametric optimally adjusted estimator of an interventional mean, or of a contrast of interventional means, is as efficient as an efficient estimator of the same parameter that exploits the information in the conditional independencies encoded in the non-parametric causal graphical model.