Goto

Collaborating Authors

 Country


A Novel Visual Fault Detection and Classification System for Semiconductor Manufacturing Using Stacked Hybrid Convolutional Neural Networks

arXiv.org Machine Learning

Automated visual inspection in the semiconductor industry aims to detect and classify manufacturing defects utilizing modern image processing techniques. While an earliest possible detection of defect patterns allows quality control and automation of manufacturing chains, manufacturers benefit from an increased yield and reduced manufacturing costs. Since classical image processing systems are limited in their ability to detect novel defect patterns, and machine learning approaches often involve a tremendous amount of computational effort, this contribution introduces a novel deep neural network-based hybrid approach. Unlike classical deep neural networks, a multi-stage system allows the detection and classification of the finest structures in pixel size within high-resolution imagery. Consisting of stacked hybrid convolutional neural networks (SH-CNN) and inspired by current approaches of visual attention, the realized system draws the focus over the level of detail from its structures to more task-relevant areas of interest. The results of our test environment show that the SH-CNN outperforms current approaches of learning-based automated visual inspection, whereas a distinction depending on the level of detail enables the elimination of defect patterns in earlier stages of the manufacturing process.


A Novel Unsupervised Post-Processing Calibration Method for DNNS with Robustness to Domain Shift

arXiv.org Machine Learning

The uncertainty estimation is critical in real-world decision making applications, especially when distributional shift between the training and test data are prevalent. Many calibration methods in the literature have been proposed to improve the predictive uncertainty of DNNs which are generally not well-calibrated. However, none of them is specifically designed to work properly under domain shift condition. In this paper, we propose Unsupervised Temperature Scaling (UTS) as a robust calibration method to domain shift. It exploits unlabeled test samples instead of the training one to adjust the uncertainty prediction of deep models towards the test distribution. UTS utilizes a novel loss function, weighted NLL, which allows unsupervised calibration. We evaluate UTS on a wide range of model-datasets to show the possibility of calibration without labels and demonstrate the robustness of UTS compared to other methods (e.g., TS, MCdropout, SVI, ensembles) in shifted domains. The predictive distributions provided by Deep Neural Networks (DNNs) have been increasingly used for decision-support systems, for applications ranging from medical diagnoses assistance (Esteva et al., 2017) to self-driving cars (Bojarski et al., 2016). In DNNs, the predictive distributions usually corresponds to the output of a softmax layer, which is typically interpreted as the confidence over the different classes. The i.i.d hypothesis made in learning usually assumes that the data distributions over the classes are the same at learning and inference time. However, in real-world applications, the distribution of data at inference time (i.e., the test data) may shift and actually be different from the original training distribution - corresponding to distribution shift in representation of data which we refer that as domain shift. For instance, in image classification problem, domain shift happens when the test images are different in illumination, view point, resolution, background or intensity noise from the training set. However, they are the same classification problem with the same objects occurance rate.


Structured Multi-Hashing for Model Compression

arXiv.org Machine Learning

Despite the success of deep neural networks (DNNs), state-of-the-art models are too large to deploy on low-resource devices or common server configurations in which multiple models are held in memory. Model compression methods address this limitation by reducing the memory footprint, latency, or energy consumption of a model with minimal impact on accuracy. We focus on the task of reducing the number of learnable variables in the model. In this work we combine ideas from weight hashing and dimensionality reductions resulting in a simple and powerful structured multi-hashing method based on matrix products that allows direct control of model size of any deep network and is trained end-to-end. We demonstrate the strength of our approach by compressing models from the ResNet, EfficientNet, and MobileNet architecture families. Our method allows us to drastically decrease the number of variables while maintaining high accuracy. For instance, by applying our approach to EfficentNet-B4 (16M parameters) we reduce it to to the size of B0 (5M parameters), while gaining over 3% in accuracy over B0 baseline. On the commonly used benchmark CIFAR10 we reduce the ResNet32 model by 75% with no loss in quality, and are able to do a 10x compression while still achieving above 90% accuracy.


DeepJSCC-f: Deep Joint-Source Channel Coding of Images with Feedback

arXiv.org Machine Learning

We consider wireless transmission of images in the presence of channel output feedback. From a Shannon theoretic perspective feedback does not improve the asymptotic end-to-end performance, and separate source coding followed by capacity achieving channel coding achieves the optimal performance. Although it is well known that separation is not optimal in the practical finite blocklength regime, there are no known practical joint source-channel coding (JSCC) schemes that can exploit the feedback signal and surpass the performance of separate schemes. Inspired by the recent success of deep learning methods for JSCC, we investigate how noiseless or noisy channel output feedback can be incorporated into the transmission system to improve the reconstruction quality at the receiver. We introduce an autoencoder-based deep JSCC scheme that exploits the channel output feedback, and provides considerable improvements in terms of the end-to-end reconstruction quality for fixed length transmission, or in terms of the average delay for variable length transmission. To the best of our knowledge, this is the first practical JSCC scheme that can fully exploit channel output feedback, demonstrating yet another setting in which modern machine learning techniques can enable the design of new and efficient communication methods that surpass the performance of traditional structured coding-based designs.


Manifold Gradient Descent Solves Multi-Channel Sparse Blind Deconvolution Provably and Efficiently

arXiv.org Machine Learning

Multi-channel sparse blind deconvolution, or convolutional sparse coding, refers to the problem of learning an unknown filter by observing its circulant convolutions with multiple input signals that are sparse. This problem finds numerous applications in signal processing, computer vision, and inverse problems. However, it is challenging to learn the filter efficiently due to the bilinear structure of the observations with respect to the unknown filter and inputs, leading to global ambiguities of identification. In this paper, we propose a novel approach based on nonconvex optimization over the sphere manifold by minimizing a smooth surrogate of the sparsity-promoting loss function. It is demonstrated that the manifold gradient descent with random initializations will provably recover the filter, up to scaling and shift ambiguity, as soon as the number of observations is sufficiently large under an appropriate random data model. Numerical experiments are provided to illustrate the performance of the proposed method with comparisons to existing methods.


Rigging the Lottery: Making All Tickets Winners

arXiv.org Machine Learning

Sparse neural networks have been shown to be more parameter and compute efficient compared to dense networks and in some cases are used to decrease wall clock inference times. There is a large body of work on training dense networks to yield sparse networks for inference (Molchanov et al., 2017; Zhu & Gupta, 2018; Narang et al., 2017; Li et al., 2016; Guo et al., 2016). This limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. Importantly,by adjusting the topology it can start from any initialization - not just "lucky" ones. We demonstrate state-of-the-art sparse training results with ResNet-50, MobileNet v1 and MobileNet v2 on the ImageNet-2012 dataset, WideResNets on the CIFAR-10 dataset and RNNs on the WikiText-103 dataset. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. 1 Introduction The parameter and floating point operation (FLOP) efficiency of sparse neural networks is now well demonstrated on a variety of problems (Han et al., 2015; Srinivas et al., 2017). Multiple works have shown inference time speedups are possible using sparsity for both Recurrent Neural Networks (RNNs) (Kalchbrenner et al., 2018) and Convolutional Neural Networks (ConvNets) (Park et al., 2016; Elsen et al., 2019). Currently, the most accurate sparse models are obtained with techniques that require, at a minimum, the cost of training a dense model in terms of memory and FLOPs (Zhu & Gupta, 2018; Guo et al., 2016), and sometimes significantly more (Molchanov et al., 2017). This paradigm has two main limitations: 1. The maximum size of sparse models is limited to the largest dense model that can be trained. Even if sparse models are more parameter efficient, we can't use pruning to train models that are larger and more accurate than the largest possible dense models. Large amounts of computation must be performed for parameters that are zero valued or that will be zero during inference. Gale et al. (2019) found that three different dense-to-sparse training algorithms all achieve about the same sparsity / accuracy tradeoff.


A New Distribution-Free Concept for Representing, Comparing, and Propagating Uncertainty in Dynamical Systems with Kernel Probabilistic Programming

arXiv.org Machine Learning

This work presents the concept of kernel mean embedding and kernel probabilistic programming in the context of stochastic systems. We propose formulations to represent, compare, and propagate uncertainties for fairly general stochastic dynamics in a distribution-free manner. The new tools enjoy sound theory rooted in functional analysis and wide applicability as demonstrated in distinct numerical examples. The implication of this new concept is a new mode of thinking about the statistical nature of uncertainty in dynamical systems.1. INTRODUCTION Classic stochastic control methods such as LQG hinge on the mathematical fact that the family of Gaussian distributions is closed under an affine transformation.


A Coefficient of Determination for Probabilistic Topic Models

arXiv.org Machine Learning

--This research proposes a new (old) metric for evaluating goodness of fit in topic models, the coefficient of determination, or R 2 . Within the context of topic modeling, R 2 has the same interpretation that it does when used in a broader class of statistical models. Reporting R 2 with topic models addresses two current problems in topic modeling: a lack of standard cross-contextual evaluation metrics for topic modeling and ease of communication with lay audiences. The author proposes that R 2 should be reported as a standard metric when constructing topic models. I NTRODUCTION According to an often-quoted but never cited definition, "the goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question." 1 Goodness of fit measures vary with the goals of those constructing the statistical model. Inferential goals may emphasize in-sample fit while predictive goals may emphasize out-of-sample fit. Prior information may be included in the goodness of fit measure for Bayesian models, or it may not. Goodness of fit measures may include methods to correct for model overfitting. In short, goodness of fit measures the performance of a statistical model against the ground truth of observed data. Fitting the data well is generally a necessary--though not sufficient--condition for trust in a statistical model, whatever its goals. Of course, goodness of fit is only one concern in statistical modeling.


ROIPCA: An Online PCA algorithm based on rank-one updates

arXiv.org Machine Learning

Principal components analysis (PCA) is a fundamental algorithm in data analysis. Its online version is useful in many modern applications where the data are too large to fit in memory, or when speed of calculation is important. In this paper we propose ROIPCA, an online PCA algorithm based on rank-one updates. ROIPCA is linear in both the dimension of the data and the number of components calculated. We demonstrate its advantages over existing state-of-the-art algorithms in terms of accuracy and running time.


Resampling-based Confidence Intervals for Model-free Robust Inference on Optimal Treatment Regimes

arXiv.org Machine Learning

Recently, there has been growing interest in estimating optimal treatment regimes which are individualized decision rules that can achieve maximal average outcomes. This paper considers the problem of inference for optimal treatment regimes in the model-free setting, where the specification of an outcome regression model is not needed. Existing model-free estimators are usually not suitable for the purpose of inference because they either have nonstandard asymptotic distributions, or are designed to achieve fisher-consistent classification performance. This paper first studies a smoothed robust estimator that directly targets estimating the parameters corresponding to the Bayes decision rule for estimating the optimal treatment regime. This estimator is shown to have an asymptotic normal distribution. Furthermore, it is proved that a resampling procedure provides asymptotically accurate inference for both the parameters indexing the optimal treatment regime and the optimal value function. A new algorithm is developed to calculate the proposed estimator with substantially improved speed and stability. Numerical results demonstrate the satisfactory performance of the new methods.