corruption
Fast Algorithms for Robust PCA via Gradient Descent
We consider the problem of Robust PCA in the fully and partially observed settings. Without corruptions, this is the well-known matrix completion problem. From a statistical standpoint this problem has been recently well-studied, and conditions on when recovery is possible (how many observations do we need, how many corruptions can we tolerate) via polynomial-time algorithms is by now understood. This paper presents and analyzes a non-convex optimization approach that greatly reduces the computational complexity of the above problems, compared to the best available algorithms. In particular, in the fully observed case, with $r$ denoting rank and $d$ dimension, we reduce the complexity from $O(r^2d^2\log(1/\epsilon))$ to $O(rd^2\log(1/\epsilon))$ -- a big savings when the rank is big. For the partially observed case, we show the complexity of our algorithm is no more than $O(r^4d\log(d)\log(1/\epsilon))$. Not only is this the best-known run-time for a provable algorithm under partial observation, but in the setting where $r$ is small compared to $d$, it also allows for near-linear-in-$d$ run-time that can be exploited in the fully-observed case as well, by simply running our algorithm on a subset of the observations.
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
The growing importance of massive datasets with the advent of deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling for large datasets, non-expert labeling, and label corruption by data poisoning adversaries. In the latter case, corruptions may be arbitrarily bad, even so bad that a classifier predicts the wrong labels with high confidence. To protect against such sources of noise, we leverage the fact that a small set of clean labels is often easy to procure. We demonstrate that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and propose a loss correction that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers. Across vision and natural language processing tasks, we experiment with various label noises at several strengths, and show that our method significantly outperforms existing methods.
- North America > United States > Maryland > Prince George's County > College Park (0.15)
- North America > United States > Tennessee > Shelby County > Memphis (0.04)
- Information Technology (1.00)
- Automobiles & Trucks (0.94)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
- Information Technology (0.68)
- Education (0.46)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > China > Hong Kong (0.04)
- (4 more...)
- Information Technology (0.46)
- Health & Medicine (0.46)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Europe > Montenegro (0.04)
- Asia > Middle East > Kuwait > Capital Governorate > Kuwait City (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (10 more...)
- Media (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
- North America > United States > California > Alameda County > Livermore (0.04)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- (3 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Texas (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Information Technology > Security & Privacy (0.66)
- Health & Medicine > Therapeutic Area > Neurology (0.46)