Goto

Collaborating Authors

 Accuracy


Extraction of Complex DNN Models: Real Threat or Boogeyman?

arXiv.org Machine Learning

Recently, machine learning (ML) has introduced advanced solutions to many domains. Since ML models provide business advantage to model owners, protecting intellectual property (IP) of ML models has emerged as an important consideration. Confidentiality of ML models can be protected by exposing them to clients only via prediction APIs. However, model extraction attacks can steal the functionality of ML models using the information leaked to clients through the results returned via the API. In this work, we question whether model extraction is a serious threat to complex, real-life ML models. We evaluate the current state-of-the-art model extraction attack (the Knockoff attack) against complex models. We reproduced and confirm the results in the Knockoff attack paper. But we also show that the performance of this attack can be limited by several factors, including ML model architecture and the granularity of API response. Furthermore, we introduce a defense based on distinguishing queries used for Knockoff attack from benign queries. Despite the limitations of the Knockoff attack, we show that a more realistic adversary can effectively steal complex ML models and evade known defenses.


7 Things You Should Know about ROC AUC

#artificialintelligence

Models for different classification problems can be fitted by trying to maximize or minimize various performance measures. Measurements that address one aspect of a model's performance but not another are important to note so that we can make an informed decision and select the performance measures that best fit our design. ROC AUC is commonly used in many fields as a prominent measure to evaluate classifier performance, and researchers might favor one classifier over another due to a higher AUC. For a refresher on ROC AUC, a clear and concise explanation can be found here. If you are totally unfamiliar with ROC AUC you may find that this post digs into the subject a bit too deep, but I hope you will still find it useful or bookmark it for future reference.


bootstrapping

#artificialintelligence

This wiki is about bootstrapping. "Recipe for yogurt: Add yogurt to milk." - Anon. Also see http://bootstrappable.org, which has pointers to a mailing list and IRC channel. Simple explanation: bootstrapping is about building a compiler using tools smaller than itself, as opposed to building a compiler using an already built version of itself. The problem with the second is: Where did that prebuilt binary come from?


Online control of the familywise error rate

arXiv.org Machine Learning

Specifically, without knowing the future p -values, the analyst must irrevocably decide at each step whether to reject the null, such that with probability at least 1 α, there are no false rejections in the entire sequence. This paper unifies algorithm design concepts developed for offline FWER control and for online false discovery rate (FDR) control. Though Bonferroni, fallback procedures and Sidak's method can trivially be extended to the online setting, our main contribution is the design of new, adaptive online algorithms that control the FWER and per-family error rate (PFER) when the p -values are independent or locally dependent in time. Our experiments demonstrate substantial gains in power, also formally proved in an idealized Gaussian model. 1 Introduction Online multiple testing refers to the setting in which a potentially infinite stream of hypotheses H 1,H 2,... (respectively p -values P 1,P 2,...) is tested sequentially one at a time. At each step t N, one must decide whether to reject the current null hypothesis H t or not, without knowing the outcomes of all the future tests. Typically, we reject the null hypothesis when P t is smaller than some threshold α t. Let R represent the set of rejected null hypotheses, and H 0 be the unknown set of true null hypotheses; then, V R H 0 is the set of incorrectly rejected null hypotheses, also known as false discoveries. Denoting V V, some common error metrics are the false discovery rate (FDR), family wise error rate (FWER), per-family error rate (PFER) and power which are defined as FDR E null V R 1 null, FWER Pr{ V 1}, PFER E [V ], power E null H c 0 R H c 0 null .


Comparison of Generative Adversarial Networks Architectures Which Reduce Mode Collapse

arXiv.org Machine Learning

Generative Adversarial Networks are known for their high quality outputs and versatility. However, they also suffer the mode collapse in their output data distribution. There have been many efforts to revamp GANs model and reduce mode collapse. This paper focuses on two of these models, PacGAN and VEEGAN. This paper explains the mathematical theory behind aforementioned models, and compare their degree of mode collapse with vanilla GAN using MNIST digits as input data. The result indicates that PacGAN performs slightly better than vanilla GAN in terms of mode collapse, and VEEGAN performs worse than both PacGAN and vanilla GAN. VEEGAN's poor performance may be attributed to average autoencoder loss in its objective function and small penalty for blurry features.


Learning Only from Relevant Keywords and Unlabeled Documents

arXiv.org Machine Learning

We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given. Although heuristic methods based on pseudo-labeling have been considered, theoretical understanding of this problem has still been limited. Moreover, previous methods cannot easily incorporate well-developed techniques in supervised text classification. In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and has flexible choices of models, e.g., linear models or neural networks. We demonstrate how to optimize the area under the receiver operating characteristic curve (AUC) effectively and also discuss how to adjust it to optimize other well-known evaluation metrics such as the accuracy and F1-measure. Finally, we show the effectiveness of our framework using benchmark datasets.


Minnesota Multiphasic Personality Inventory-2 (MMPI-2)

#artificialintelligence

The original Minnesota Multiphasic Personality Inventory (MMPI) was published in 1940 and the second revised version--the MMPI-2--was published in 1989. It is the most widely used psychometric test for measuring adult psychopathology in the world. The MMPI-2 is used in mental health, medical and employment settings. The test developers Hathaway and McKinley used an empirical test construction technique to develop the MMPI. This involved basing the test scales (for example the hypochondriasis scale) on the actual test items that differentiate people with hypochondriasis from'normals'.


Private Protocols for U-Statistics in the Local Model and Beyond

arXiv.org Machine Learning

In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP). The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient and Area under the ROC Curve (AUC), as well as empirical risk measures for machine learning problems such as ranking, clustering and metric learning. We first introduce an LDP protocol based on quantizing the data into bins and applying randomized response, which guarantees an $\epsilon$-LDP estimate with a Mean Squared Error (MSE) of $O(1/\sqrt{n}\epsilon)$ under regularity assumptions on the $U$-statistic or the data distribution. We then propose a specialized protocol for AUC based on a novel use of hierarchical histograms that achieves MSE of $O(\alpha^3/n\epsilon^2)$ for arbitrary data distribution. We also show that 2-party secure computation allows to design a protocol with MSE of $O(1/n\epsilon^2)$, without any assumption on the kernel function or data distribution and with total communication linear in the number of users $n$. Finally, we evaluate the performance of our protocols through experiments on synthetic and real datasets.


Estimating regression errors without ground truth values

arXiv.org Machine Learning

Regression analysis is a standard supervised machine learning method used to model an outcome variable in terms of a set of predictor variables. In most real-world applications we do not know the true value of the outcome variable being predicted outside the training data, i.e., the ground truth is unknown. It is hence not straightforward to directly observe when the estimate from a model potentially is wrong, due to phenomena such as overfitting and concept drift. In this paper we present an efficient framework for estimating the generalization error of regression functions, applicable to any family of regression functions when the ground truth is unknown. We present a theoretical derivation of the framework and empirically evaluate its strengths and limitations. We find that it performs robustly and is useful for detecting concept drift in datasets in several real-world domains.


Out-of-distribution Detection in Classifiers via Generation

arXiv.org Machine Learning

By design, discriminatively trained neural network classifiers produce reliable predictions only for in-distribution samples. For their real-world deployments, detecting out-of-distribution (OOD) samples is essential. Assuming OOD to be outside the closed boundary of in-distribution, typical neural classifiers do not contain the knowledge of this boundary for OOD detection during inference. There have been recent approaches to instill this knowledge in classifiers by explicitly training the classifier with OOD samples close to the in-distribution boundary. However, these generated samples fail to cover the entire in-distribution boundary effectively, thereby resulting in a sub-optimal OOD detector. In this paper, we analyze the feasibility of such approaches by investigating the complexity of producing such "effective" OOD samples. We also propose a novel algorithm to generate such samples using a manifold learning network (e.g., variational autoencoder) and then train an n+1 classifier for OOD detection, where the $n+1^{th}$ class represents the OOD samples. We compare our approach against several recent classifier-based OOD detectors on MNIST and Fashion-MNIST datasets. Overall the proposed approach consistently performs better than the others.