model prediction
2cd5737c59645f7ef23b2842b705edf2-Paper-Conference.pdf
Image classification accuracy on the ImageNet dataset has been a barometer for progress in computer vision over the last decade. Several recent papers have questioned the degree to which the benchmark remains useful to the community [33, 3, 31, 42, 36], yet innovations continue to contribute gains to performance, with today's largest models achieving 90%+ top-1 accuracy. To help contextualize progress on ImageNet and provide a more meaningful evaluation for today's stateof-the-art models, we manually review and categorize every remaining mistake that a few top models make and provide insights into the long-tail of errors on one of the most benchmarked datasets in computer vision. We focus on the multi-label subset evaluation of ImageNet, where today's best models achieve upwards of 97% top-1 accuracy. Our analysis reveals that nearly half of the supposed mistakes are not mistakes at all, and we uncover new valid multi-labels, demonstrating that, without careful review, we are significantly underestimating the performance of these models. On the other hand, we also find that today's best models still make a significant number of mistakes (40%) that are obviously wrong to human reviewers. To calibrate future progress on ImageNet, we provide an updated multilabel evaluation set, and we curate ImageNet-Major1: a 68-example "major error" slice of the obvious mistakes made by today's top models--a slice where models should achieve near perfection, but today are far from doing so.
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation Liyao T ang
This approach may, however, hinder the comprehensive exploitation of unlabeled data points. We hypothesize that this selective usage arises from the noise in pseudo-labels generated on unlabeled data. The noise in pseudo-labels may result in significant discrepancies between pseudo-labels and model predictions, thus confusing and affecting the model training greatly.
1 Data Ingestion
For all other remaining architectures, the reported results are from private datasets. Neck Shaft Angle(NSA) cannot be estimated. Additionally, [? ] requires estimation of the diaphysis Figure 4: Repeatability of the femur morphometry extraction method as measured by error distributions for a) the landmarks/anatomical sizes and b) axis alignment identified by the adapted method. Do the main claims made in the abstract and introduction accurately reflect the paper's Did you specify all the training details (e.g., data splits, hyperparameters, how they were Data splits are available in the GitHub repository. Did you report error bars (e.g., with respect to the random seed after running ex-67 Did you include the total amount of compute and the type of resources used (e.g., Did you mention the license of the assets?
Self-AdaptiveTraining: beyondEmpiricalRisk Minimization
This problem is important to robustly learning from data that are corrupted by,e.g., random noise and adversarial examples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noise and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly mitigates the overfitting issue and improves generalization over ERM under both random and adversarial noise.
max
Toclarifywhere the adversarial brittleness truly comes from, we need to figure out how the robust and non-robust features in data manifold subtly manipulate feature representation and fool model prediction, by directly handling them in the feature space. To address it, we propose a way to precisely distill intermediate features into robust and non-robust features by employing Information Bottleneck (IB) [17, 18, 19].