Goto

Collaborating Authors

 huber


Scientists want you to smell ancient Egyptian mummies

Popular Science

A mixture of archeology and chemistry brings the aroma of mummification to museums. Breakthroughs, discoveries, and DIY tips sent six days a week. Visiting a museum could soon be a truly multisensory experience--smells included. Thanks to recent advances in the field of biomolecular archeology, scientists can now detect traces of molecular fingerprints on ancient artifacts. From these tiny particles, scientists can determine how the objects may have smelled .


Outlier-robust estimation of a sparse linear model using \ell_1 -penalized Huber's M -estimator

Neural Information Processing Systems

We study the problem of estimating a $p$-dimensional $s$-sparse vector in a linear model with Gaussian design. In the case where the labels are contaminated by at most $o$ adversarial outliers, we prove that the $\ell_1$-penalized Huber's $M$-estimator based on $n$ samples attains the optimal rate of convergence $(s/n)^{1/2} + (o/n)$, up to a logarithmic factor. For more general design matrices, our results highlight the importance of two properties: the transfer principle and the incoherence property. These properties with suitable constants are shown to yield the optimal rates of robust estimation with adversarial contamination.


On Learning Ising Models under Huber's Contamination Model

Neural Information Processing Systems

We study the problem of learning Ising models in a setting where some of the samples from the underlying distribution can be arbitrarily corrupted. In such a setup, we aim to design statistically optimal estimators in a high-dimensional scaling in which the number of nodes p, the number of edges k and the maximal node degree d are allowed to increase to infinity as a function of the sample size n. Our analysis is based on exploiting moments of the underlying distribution, coupled with novel reductions to univariate estimation. Our proposed estimators achieve an optimal dimension independent dependence on the fraction of corrupted data in the contaminated setting, while also simultaneously achieving high-probability error guarantees with optimal sample-complexity. We corroborate our theoretical results by simulations.


A theoretical framework for M-posteriors: frequentist guarantees and robustness properties

Marusic, Juraj, Medina, Marco Avella, Rush, Cynthia

arXiv.org Machine Learning

We provide a theoretical framework for a wide class of generalized posteriors that can be viewed as the natural Bayesian posterior counterpart of the class of M-estimators in the frequentist world. We call the members of this class M-posteriors and show that they are asymptotically normally distributed under mild conditions on the M-estimation loss and the prior. In particular, an M-posterior contracts in probability around a normal distribution centered at an M-estimator, showing frequentist consistency and suggesting some degree of robustness depending on the reference M-estimator. We formalize the robustness properties of the M-posteriors by a new characterization of the posterior influence function and a novel definition of breakdown point adapted for posterior distributions. We illustrate the wide applicability of our theory in various popular models and illustrate their empirical relevance in some numerical examples.



On Learning Ising Models under Huber's Contamination Model

Neural Information Processing Systems

We study the problem of learning Ising models in a setting where some of the samples from the underlying distribution can be arbitrarily corrupted. In such a setup, we aim to design statistically optimal estimators in a high-dimensional scaling in which the number of nodes p, the number of edges k and the maximal node degree d are allowed to increase to infinity as a function of the sample size n. Our analysis is based on exploiting moments of the underlying distribution, coupled with novel reductions to univariate estimation. Our proposed estimators achieve an optimal dimension independent dependence on the fraction of corrupted data in the contaminated setting, while also simultaneously achieving high-probability error guarantees with optimal sample-complexity. We corroborate our theoretical results by simulations.


Reviews: Outlier-robust estimation of a sparse linear model using \ell_1 -penalized Huber's M -estimator

Neural Information Processing Systems

This is a very good paper. I really enjoyed reading the paper. The result is very strong and the paper is very readable. I strongly recommend accepting the paper, if the proof is correct. This paper solves an important open problem in robust estimation.


On Robust Cross Domain Alignment

Chakrabarty, Anish, Basu, Arkaprabha, Das, Swagatam

arXiv.org Machine Learning

The Gromov-Wasserstein (GW) distance is an effective measure of alignment between distributions supported on distinct ambient spaces. Calculating essentially the mutual departure from isometry, it has found vast usage in domain translation and network analysis. It has long been shown to be vulnerable to contamination in the underlying measures. All efforts to introduce robustness in GW have been inspired by similar techniques in optimal transport (OT), which predominantly advocate partial mass transport or unbalancing. In contrast, the cross-domain alignment problem being fundamentally different from OT, demands specific solutions to tackle diverse applications and contamination regimes. Deriving from robust statistics, we discuss three contextually novel techniques to robustify GW and its variants. For each method, we explore metric properties and robustness guarantees along with their co-dependencies and individual relations with the GW distance.


On Learning Ising Models under Huber's Contamination Model

Neural Information Processing Systems

We study the problem of learning Ising models in a setting where some of the samples from the underlying distribution can be arbitrarily corrupted. In such a setup, we aim to design statistically optimal estimators in a high-dimensional scaling in which the number of nodes p, the number of edges k and the maximal node degree d are allowed to increase to infinity as a function of the sample size n. Our analysis is based on exploiting moments of the underlying distribution, coupled with novel reductions to univariate estimation. Our proposed estimators achieve an optimal dimension independent dependence on the fraction of corrupted data in the contaminated setting, while also simultaneously achieving high-probability error guarantees with optimal sample-complexity. We corroborate our theoretical results by simulations.


Outlier-robust estimation of a sparse linear model using \ell_1 -penalized Huber's M -estimator

Neural Information Processing Systems

We study the problem of estimating a p -dimensional s -sparse vector in a linear model with Gaussian design. In the case where the labels are contaminated by at most o adversarial outliers, we prove that the \ell_1 -penalized Huber's M -estimator based on n samples attains the optimal rate of convergence (s/n) {1/2} (o/n), up to a logarithmic factor. For more general design matrices, our results highlight the importance of two properties: the transfer principle and the incoherence property. These properties with suitable constants are shown to yield the optimal rates of robust estimation with adversarial contamination.