Performance Analysis
IEG: Robust Neural Network Training to Tackle Severe Label Noise
Zhang, Zizhao, Zhang, Han, Arik, Sercan O., Lee, Honglak, Pfister, Tomas
Collecting large-scale data with clean labels for supervised training of neural networks is practically challenging. Although noisy labels are usually cheap to acquire, existing methods suffer severely for training datasets with high noise ratios, making high-cost human labeling a necessity. Here we present a method to train neural networks in a way that is almost invulnerable to severe label noise by utilizing a tiny trusted set. Our method, named IEG, is based on three key insights: (i) Isolation of noisy labels, (ii) Escalation of useful supervision from mislabeled data, and (iii) Guidance from small trusted data. On CIFAR100 with a 40% uniform noise ratio and 10 trusted labeled data per class, our method achieves 80. 2 0.3% classification accuracy, only 1.4% higher error than a neural network trained without label noise. Moreover, increasing the noise ratio to 80%, our method still achieves a high accuracy of 75 .5 Training deep neural networks usually requires large-scale labeled data. However, the process of data labelling by humans is challenging and expensive in practice, especially in domains where expert annotators are needed such as medical imaging. A great number of methods have been proposed to train neural networks from datasets with noisy labels due to cheap acquisition (e.g.
Open-plan Glare Evaluator (OGE): A New Glare Prediction Model for Open-Plan Offices Using Machine Learning Algorithms
Wagdy, Ayman, Garcia-Hansen, Veronica, Elhenawy, Mohammed, Isoardi, Gillian, Drogemuller, Robin, Fathy, Fatma
Predicting discomfort glare in open-plan offices is a challenging problem since most of available glare metrics are developed for cellular offices which are typically daylight dominated. The problem with open-plan offices is that they are mainly dependent on electric lighting rather than daylight even when they have a fully glazed facade. In addition, the contrast between bright windows and the buildings interior can be problematic and may cause discomfort glare to the building occupants. These problems can affect occupant productivity and wellbeing. Thus, it is important to develop a predictive model to avoid discomfort glare when designing open plan offices. To the best of our knowledge, we are the first to adopt Machine Learning (ML) models to predict discomfort glare. In order to develop new glare predictive models for these types of offices, Post-Occupancy Evaluation (POE) and High Dynamic Range (HDR) images were collected from 80 occupants (n=80) in four different open-plan offices. Consequently, various multi-region luminance values, luminance and glare indices were calculated and used as input features to train ML models. The accuracy of the ML model was compared to the accuracy of 24 indices which were also evaluated using a Receiver Operating Characteristic (ROC) analysis to identify the best cutoff values (thresholds) for each index for open-plan configurations. Results showed that the ML glare model could predict glare in open-plan offices with an accuracy of 83.8% (0.80 true positive rate and 0.86 true negative rate) which outperformed the accuracy of the previously developed glare metrics.
Extraction of Complex DNN Models: Real Threat or Boogeyman?
Atli, Buse Gul, Szyller, Sebastian, Juuti, Mika, Marchal, Samuel, Asokan, N.
Recently, machine learning (ML) has introduced advanced solutions to many domains. Since ML models provide business advantage to model owners, protecting intellectual property (IP) of ML models has emerged as an important consideration. Confidentiality of ML models can be protected by exposing them to clients only via prediction APIs. However, model extraction attacks can steal the functionality of ML models using the information leaked to clients through the results returned via the API. In this work, we question whether model extraction is a serious threat to complex, real-life ML models. We evaluate the current state-of-the-art model extraction attack (the Knockoff attack) against complex models. We reproduced and confirm the results in the Knockoff attack paper. But we also show that the performance of this attack can be limited by several factors, including ML model architecture and the granularity of API response. Furthermore, we introduce a defense based on distinguishing queries used for Knockoff attack from benign queries. Despite the limitations of the Knockoff attack, we show that a more realistic adversary can effectively steal complex ML models and evade known defenses.
7 Things You Should Know about ROC AUC
Models for different classification problems can be fitted by trying to maximize or minimize various performance measures. Measurements that address one aspect of a model's performance but not another are important to note so that we can make an informed decision and select the performance measures that best fit our design. ROC AUC is commonly used in many fields as a prominent measure to evaluate classifier performance, and researchers might favor one classifier over another due to a higher AUC. For a refresher on ROC AUC, a clear and concise explanation can be found here. If you are totally unfamiliar with ROC AUC you may find that this post digs into the subject a bit too deep, but I hope you will still find it useful or bookmark it for future reference.
bootstrapping
This wiki is about bootstrapping. "Recipe for yogurt: Add yogurt to milk." - Anon. Also see http://bootstrappable.org, which has pointers to a mailing list and IRC channel. Simple explanation: bootstrapping is about building a compiler using tools smaller than itself, as opposed to building a compiler using an already built version of itself. The problem with the second is: Where did that prebuilt binary come from?
Online control of the familywise error rate
Specifically, without knowing the future p -values, the analyst must irrevocably decide at each step whether to reject the null, such that with probability at least 1 α, there are no false rejections in the entire sequence. This paper unifies algorithm design concepts developed for offline FWER control and for online false discovery rate (FDR) control. Though Bonferroni, fallback procedures and Sidak's method can trivially be extended to the online setting, our main contribution is the design of new, adaptive online algorithms that control the FWER and per-family error rate (PFER) when the p -values are independent or locally dependent in time. Our experiments demonstrate substantial gains in power, also formally proved in an idealized Gaussian model. 1 Introduction Online multiple testing refers to the setting in which a potentially infinite stream of hypotheses H 1,H 2,... (respectively p -values P 1,P 2,...) is tested sequentially one at a time. At each step t N, one must decide whether to reject the current null hypothesis H t or not, without knowing the outcomes of all the future tests. Typically, we reject the null hypothesis when P t is smaller than some threshold α t. Let R represent the set of rejected null hypotheses, and H 0 be the unknown set of true null hypotheses; then, V R H 0 is the set of incorrectly rejected null hypotheses, also known as false discoveries. Denoting V V, some common error metrics are the false discovery rate (FDR), family wise error rate (FWER), per-family error rate (PFER) and power which are defined as FDR E null V R 1 null, FWER Pr{ V 1}, PFER E [V ], power E null H c 0 R H c 0 null .
Comparison of Generative Adversarial Networks Architectures Which Reduce Mode Collapse
Generative Adversarial Networks are known for their high quality outputs and versatility. However, they also suffer the mode collapse in their output data distribution. There have been many efforts to revamp GANs model and reduce mode collapse. This paper focuses on two of these models, PacGAN and VEEGAN. This paper explains the mathematical theory behind aforementioned models, and compare their degree of mode collapse with vanilla GAN using MNIST digits as input data. The result indicates that PacGAN performs slightly better than vanilla GAN in terms of mode collapse, and VEEGAN performs worse than both PacGAN and vanilla GAN. VEEGAN's poor performance may be attributed to average autoencoder loss in its objective function and small penalty for blurry features.
Learning Only from Relevant Keywords and Unlabeled Documents
Charoenphakdee, Nontawat, Lee, Jongyeong, Jin, Yiping, Wanvarie, Dittaya, Sugiyama, Masashi
We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given. Although heuristic methods based on pseudo-labeling have been considered, theoretical understanding of this problem has still been limited. Moreover, previous methods cannot easily incorporate well-developed techniques in supervised text classification. In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and has flexible choices of models, e.g., linear models or neural networks. We demonstrate how to optimize the area under the receiver operating characteristic curve (AUC) effectively and also discuss how to adjust it to optimize other well-known evaluation metrics such as the accuracy and F1-measure. Finally, we show the effectiveness of our framework using benchmark datasets.
Minnesota Multiphasic Personality Inventory-2 (MMPI-2)
The original Minnesota Multiphasic Personality Inventory (MMPI) was published in 1940 and the second revised version--the MMPI-2--was published in 1989. It is the most widely used psychometric test for measuring adult psychopathology in the world. The MMPI-2 is used in mental health, medical and employment settings. The test developers Hathaway and McKinley used an empirical test construction technique to develop the MMPI. This involved basing the test scales (for example the hypochondriasis scale) on the actual test items that differentiate people with hypochondriasis from'normals'.
Private Protocols for U-Statistics in the Local Model and Beyond
Bell, James, Bellet, Aurélien, Gascón, Adrià, Kulkarni, Tejas
In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP). The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient and Area under the ROC Curve (AUC), as well as empirical risk measures for machine learning problems such as ranking, clustering and metric learning. We first introduce an LDP protocol based on quantizing the data into bins and applying randomized response, which guarantees an $\epsilon$-LDP estimate with a Mean Squared Error (MSE) of $O(1/\sqrt{n}\epsilon)$ under regularity assumptions on the $U$-statistic or the data distribution. We then propose a specialized protocol for AUC based on a novel use of hierarchical histograms that achieves MSE of $O(\alpha^3/n\epsilon^2)$ for arbitrary data distribution. We also show that 2-party secure computation allows to design a protocol with MSE of $O(1/n\epsilon^2)$, without any assumption on the kernel function or data distribution and with total communication linear in the number of users $n$. Finally, we evaluate the performance of our protocols through experiments on synthetic and real datasets.