Country
Negative Margin Matters: Understanding Margin in Few-shot Classification
Liu, Bin, Cao, Yue, Lin, Yutong, Li, Qi, Zhang, Zheng, Long, Mingsheng, Hu, Han
This paper introduces a negative margin loss to metric learning based few-shot learning methods. The negative margin loss significantly outperforms regular softmax loss, and achieves state-of-the-art accuracy on three standard few-shot classification benchmarks with few bells and whistles. These results are contrary to the common practice in the metric learning field, that the margin is zero or positive. To understand why the negative margin loss performs well for the few-shot classification, we analyze the discriminability of learned features w.r.t different margins for training and novel classes, both empirically and theoretically. We find that although negative margin reduces the feature discriminability for training classes, it may also avoid falsely mapping samples of the same novel class to multiple peaks or clusters, and thus benefit the discrimination of novel classes.
Corella: A Private Multi Server Learning Approach based on Correlated Queries
Ehteram, Hamidreza, Maddah-Ali, Mohammad Ali, Mirmohseni, Mahtab
The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud. One of the major challenges in this setup is to guarantee the privacy of the client's data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding noise to the client data, which reduces the accuracy of the result, (ii) using secure multiparty computation, which requires significant communication among the computing nodes or with the client, (iii) relying on homomorphic encryption methods, which significantly increases computation load. In this paper, we propose an alternative approach to protect the privacy of user data. The proposed scheme relies on a cluster of servers where at most $T$ of them for some integer $T$, may collude, that each running a deep neural network. Each server is fed with the client data, added with a $\textit{strong}$ noise. This makes the information leakage to each server information-theoretically negligible. On the other hand, the added noises for different servers are $\textit{correlated}$. This correlation among queries allows the system to be $\textit{trained}$ such that the client can recover the final result with high accuracy, by combining the outputs of the servers, with minor computation efforts. Simulation results for various datasets demonstrate the accuracy of the proposed approach.
From unbiased MDI Feature Importance to Explainable AI for Trees
We attempt to give a unifying view of the various recent attempts to (i) improve the interpretability of tree-based models and (ii) debias the the default variable-importance measure in random Forests, Gini importance. In particular, we demonstrate a common thread among the out-of-bag based bias correction methods and their connection to local explanation for trees. In addition, we point out a bias caused by the inclusion of inbag data in the newly developed explainable AI for trees algorithms.
Convolutional Neural Networks for Image-based Corn Kernel Detection and Counting
Khaki, Saeed, Pham, Hieu, Han, Ye, Kuhl, Andy, Kent, Wade, Wang, Lizhi
Precise in-season corn grain yield estimates enable farmers to make real-time accurate harvest and grain marketing decisions minimizing possible losses of profitability. A well developed corn ear can have up to 800 kernels, but manually counting the kernels on an ear of corn is labor-intensive, time consuming and prone to human error. From an algorithmic perspective, the detection of the kernels from a single corn ear image is challenging due to the large number of kernels at different angles and very small distance among the kernels. In this paper, we propose a kernel detection and counting method based on a sliding window approach. The proposed method detect and counts all corn kernels in a single corn ear image taken in uncontrolled lighting conditions. The sliding window approach uses a convolutional neural network (CNN) for kernel detection. Then, a non-maximum suppression (NMS) is applied to remove overlapping detections. Finally, windows that are classified as kernel are passed to another CNN regression model for finding the (x,y) coordinates of the center of kernel image patches. Our experiments indicate that the proposed method can successfully detect the corn kernels with a low detection error and is also able to detect kernels on a batch of corn ears positioned at different angles.
Obliviousness Makes Poisoning Adversaries Weaker
Garg, Sanjam, Jha, Somesh, Mahloujifar, Saeed, Mahmoody, Mohammad, Thakurta, Abhradeep
Poisoning attacks have emerged as a significant security threat to machine learning (ML) algorithms. It has been demonstrated that adversaries who make small changes to the training set, such as adding specially crafted data points, can hurt the performance of the output model. Most of these attacks require the full knowledge of training data or the underlying data distribution. In this paper we study the power of oblivious adversaries who do not have any information about the training set. We show a separation between oblivious and full-information poisoning adversaries. Specifically, we construct a sparse linear regression problem for which LASSO estimator is robust against oblivious adversaries whose goal is to add a non-relevant features to the model with certain poisoning budget. On the other hand, non-oblivious adversaries, with the same budget, can craft poisoning examples based on the rest of the training data and successfully add non-relevant features to the model.
Estimating Treatment Effects with Observed Confounders and Mediators
Gupta, Shantanu, Lipton, Zachary C., Childers, David
Given a causal graph, the do-calculus can express treatment effects as functionals of the observational joint distribution that can be estimated empirically. Sometimes the do-calculus identifies multiple valid formulae, prompting us to compare the statistical properties of the corresponding estimators. For example, the backdoor formula applies when all confounders are observed and the frontdoor formula applies when an observed mediator transmits the causal effect. In this paper, we investigate the over-identified scenario where both confounders and mediators are observed, rendering both estimators valid. Addressing the linear Gaussian causal model, we derive the finite-sample variance for both estimators and demonstrate that either estimator can dominate the other by an unbounded constant factor depending on the model parameters. Next, we derive an optimal estimator, which leverages all observed variables to strictly outperform the backdoor and frontdoor estimators. We also present a procedure for combining two datasets, with confounders observed in one and mediators in the other. Finally, we evaluate our methods on both simulated data and the IHDP and JTPA datasets.
StrokeCoder: Path-Based Image Generation from Single Examples using Transformers
Wieluch, Sabine, Schwenker, Friedhelm
This paper demonstrates how a Transformer Neural Network can be used to learn a Generative Model from a single path-based example image. We further show how a data set can be generated from the example image and how the model can be used to generate a large set of deviated images, which still represent the original image's style and concept.
A general framework for causal classification
Li, Jiuyong, Zhang, Weijia, Liu, Lin, Yu, Kui, Le, Thuc Duy, Liu, Jixue
In many applications, there is a need to predict the effect of an intervention on different individuals from data. For example, which customers are persuadable by a product promotion? which groups would benefit from a new policy? These are typical causal classification questions involving the effect or the change in outcomes made by an intervention. The questions cannot be answered with traditional classification methods as they only deal with static outcomes. In marketing research these questions are often answered with uplift modelling, using experimental data. Some machine learning methods have been proposed for heterogeneous causal effect estimation using either experimental or observational data. In principle these methods can be used for causal classification, but a limited number of methods, mainly tree based, on causal heterogeneity modelling, are inadequate for various real world applications. In this paper, we propose a general framework for causal classification, as a generalisation of both uplift modelling and causal heterogeneity modelling. When developing the framework, we have identified the conditions where causal classification in both observational and experimental data can be resolved by a naive solution using off-the-shelf classification methods, which supports flexible implementations for various applications. This result not only enables a practical way to solve the causal classification problem by using any existing classification method in the proposed framework, but also makes it possible to cross use the methods developed in both uplift modelling and causal heterogeneity modelling areas when the conditions are satisfied. Experiments have shown that our framework with off-the-shelf classification methods is as competitive as the tailor-designed uplift modelling and heterogeneous causal effect modelling methods.
Triad State Space Construction for Chaotic Signal Classification with Deep Learning
Inspired by the well-known permutation entropy (PE), an effective image encoding scheme for chaotic time series, Triad State Space Construction (TSSC), is proposed. The TSSC image can recognize higher-order temporal patterns and identify new forbidden regions in time series motifs beyond the Bandt-Pompe probabilities. The Convolutional Neural Network (ConvNet) is widely used in image classification. The ConvNet classifier based on TSSC images (TSSC-ConvNet) are highly accurate and very robust in the chaotic signal classification.
A lower bound for the ELBO of the Bernoulli Variational Autoencoder
Sicks, Robert, Korn, Ralf, Schwaar, Stefanie
We consider a variational autoencoder (VAE) for binary data. Our main innovations are an interpretable lower bound for its training objective, a modified initialization and architecture of such a VAE that leads to faster training, and a decision support for finding the appropriate dimension of the latent space via using a PCA. Numerical examples illustrate our theoretical result and the performance of the new architecture.