linear learner
What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners?
We study indiscriminate poisoning for linear learners where an adversary injects a few crafted examples into the training data with the goal of forcing the induced model to incur higher test error. Inspired by the observation that linear learners on some datasets are able to resist the best known attacks even without any defenses, we further investigate whether datasets can be inherently robust to indiscriminate poisoning attacks for linear learners. For theoretical Gaussian distributions, we rigorously characterize the behavior of an optimal poisoning attack, defined as the poisoning strategy that attains the maximum risk of the induced model at a given poisoning budget. Our results prove that linear learners can indeed be robust to indiscriminate poisoning if the class-wise data distributions are well-separated with low variance and the size of the constraint set containing all permissible poisoning points is also small. These findings largely explain the drastic variation in empirical attack performance of the state-of-the-art poisoning attacks on linear learners across benchmark datasets, making an important initial step towards understanding the underlying reasons some learning tasks are vulnerable to data poisoning attacks.
What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners?
We study indiscriminate poisoning for linear learners where an adversary injects a few crafted examples into the training data with the goal of forcing the induced model to incur higher test error. Inspired by the observation that linear learners on some datasets are able to resist the best known attacks even without any defenses, we further investigate whether datasets can be inherently robust to indiscriminate poisoning attacks for linear learners. For theoretical Gaussian distributions, we rigorously characterize the behavior of an optimal poisoning attack, defined as the poisoning strategy that attains the maximum risk of the induced model at a given poisoning budget. Our results prove that linear learners can indeed be robust to indiscriminate poisoning if the class-wise data distributions are well-separated with low variance and the size of the constraint set containing all permissible poisoning points is also small. These findings largely explain the drastic variation in empirical attack performance of the state-of-the-art poisoning attacks on linear learners across benchmark datasets, making an important initial step towards understanding the underlying reasons some learning tasks are vulnerable to data poisoning attacks.
Multi-Study Boosting: Theoretical Considerations for Merging vs. Ensembling
Shyr, Cathy, Sur, Pragya, Parmigiani, Giovanni, Patil, Prasad
Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relationships across studies and compare two multi-study learning strategies: 1) merging all the studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and ensembling the resulting predictions. In the regression setting, we provide theoretical guidelines based on an analytical transition point to determine whether it is more beneficial to merge or to ensemble for boosting with linear learners. In addition, we characterize a bias-variance decomposition of estimation error for boosting with component-wise linear learners. We verify the theoretical transition point result in simulation and illustrate how it can guide the decision on merging vs. ensembling in an application to breast cancer gene expression data.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Europe > Latvia > Riga Municipality > Riga (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Causality-aware counterfactual confounding adjustment as an alternative to linear residualization in anticausal prediction tasks based on linear learners
Linear residualization is a common practice for confounding adjustment in machine learning (ML) applications. Recently, causality-aware predictive modeling has been proposed as an alternative causality-inspired approach for adjusting for confounders. The basic idea is to simulate counterfactual data that is free from the spurious associations generated by the observed confounders. In this paper, we compare the linear residualization approach against the causality-aware confounding adjustment in anticausal prediction tasks, and show that the causality-aware approach tends to (asymptotically) outperform the residualization adjustment in terms of predictive performance in linear learners. Importantly, our results still holds even when the true model is not linear. We illustrate our results in both regression and classification tasks, where we compared the causality-aware and residualization approaches using mean squared errors and classification accuracy in synthetic data experiments where the linear regression model is mispecified, as well as, when the linear model is correctly specified. Furthermore, we illustrate how the causality-aware approach is more stable than residualization with respect to dataset shifts in the joint distribution of the confounders and outcome variables.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.68)
Announcing ML.NET 0.4
A few months ago we released ML.NET 0.1 at //Build 2018., ML.NET is a cross-platform, open source machine learning framework for .NET developers. We've gotten great feedback so far and would like to thank the community for your engagement as we continue to develop ML.NET together in the open. We are happy to announce the latest version: ML.NET 0.4. In this release we've improved support for natural language processing (NLP) scenarios by adding the Word Embedding Transform, improved the speed of linear learners like binary classification and linear regression by adding support for the SymSGD learner, made improvements to the F# API and samples for ML.NET, bug fixes and more. Additionally, we really want your feedback on making ML.NET really easy to use.
Diversity Regularized Machine
Yu, Yang (Nanjing University) | Li, Yu-Feng (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
Ensemble methods, which train multiple learners for a task, are among the state-of-the-art learning approaches. The diversity of the component learners has been recognized as a key to a good ensemble, and existing ensemble methods try different ways to encourage diversity, mostly by heuristics. In this paper, we propose the diversity regularized machine (DRM) in a mathematical programming framework, which efficiently generates an ensemble of diverse support vector machines (SVMs). Theoretical analysis discloses that the diversity constraint used in DRM can lead to an effective reduction on its hypothesis space complexity, implying that the diversity control in ensemble methods indeed plays a role of regularization as in popular statistical learning approaches. Experiments show that DRM can significantly improve generalization ability and is superior to some state-of-the-art SVM ensemble methods.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)