Goto

Collaborating Authors

 terrier


1 Details for Dataset Partitioning Here we provide the dataset partitioning results for ImageNet [

Neural Information Processing Systems

Novel categories names:['High_Jump', 'Front_Crawl', 'Pole_V ault', 'Hammer_Throw', All experiments are conducted under the 16-shot setting. An incremental bayesian approach tested on 101 object categories. Conditional prompt learning for vision-language models.


Benchmarking Robustness to Adversarial Image Obfuscations

Neural Information Processing Systems

Advances in in computer vision have lead to classifiers that nearly match human performance in many applications. However, while the human visual system is remarkably versatile in extracting semantic meaning out of even degraded and heavily obfuscated images, today's visual classifiers significantly lag behind in emulating the same robustness, and often yield incorrect outputs in the presence of natural and adversarial degradations.


Automated Classification of Model Errors on ImageNet

Neural Information Processing Systems

While the ImageNet dataset has been driving computer vision research over the past decade, significant label noise and ambiguity have made top-1 accuracy an insufficient measure of further progress.




1 Details for Dataset Partitioning Here we provide the dataset partitioning results for ImageNet [

Neural Information Processing Systems

Novel categories names:['High_Jump', 'Front_Crawl', 'Pole_V ault', 'Hammer_Throw', All experiments are conducted under the 16-shot setting. An incremental bayesian approach tested on 101 object categories. Conditional prompt learning for vision-language models.


Terrier: A Deep Learning Repeat Classifier

Turnbull, Robert, Young, Neil D., Tescari, Edoardo, Skerratt, Lee F., Kosch, Tiffany A.

arXiv.org Artificial Intelligence

Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Existing tools often struggle to classify divergent taxa due to biases in reference libraries, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on RepBase, which includes over 100,000 repeat families -- four times more than Dfam -- Terrier maps 97.1% of RepBase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice and fruit flies), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian and flatworm genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.


Benchmarking Robustness to Adversarial Image Obfuscations

Stimberg, Florian, Chakrabarti, Ayan, Lu, Chun-Ta, Hazimeh, Hussein, Stretcu, Otilia, Qiao, Wei, Liu, Yintao, Kaya, Merve, Rashtchian, Cyrus, Fuxman, Ariel, Tek, Mehmet, Gowal, Sven

arXiv.org Artificial Intelligence

Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors may obfuscate policy violating images (e.g. overlay harmful images by carefully selected benign images or visual patterns) to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors. It goes beyond ImageNet-$\textrm{C}$ and ImageNet-$\bar{\textrm{C}}$ by proposing general, drastic, adversarial modifications that preserve the original content intent. It aims to tackle a more common adversarial threat than the one considered by $\ell_p$-norm bounded adversaries. We evaluate 33 pretrained models on the benchmark and train models with different augmentations, architectures and training methods on subsets of the obfuscations to measure generalization. We hope this benchmark will encourage researchers to test their models and methods and try to find new approaches that are more robust to these obfuscations.


Automated Classification of Model Errors on ImageNet

Peychev, Momchil, Müller, Mark Niklas, Fischer, Marc, Vechev, Martin

arXiv.org Artificial Intelligence

While the ImageNet dataset has been driving computer vision research over the past decade, significant label noise and ambiguity have made top-1 accuracy an insufficient measure of further progress. To address this, new label-sets and evaluation protocols have been proposed for ImageNet showing that state-of-the-art models already achieve over 95% accuracy and shifting the focus on investigating why the remaining errors persist. Recent work in this direction employed a panel of experts to manually categorize all remaining classification errors for two selected models. However, this process is time-consuming, prone to inconsistencies, and requires trained experts, making it unsuitable for regular model evaluation thus limiting its utility. To overcome these limitations, we propose the first automated error classification framework, a valuable tool to study how modeling choices affect error distributions. We use our framework to comprehensively evaluate the error distribution of over 900 models. Perhaps surprisingly, we find that across model architectures, scales, and pre-training corpora, top-1 accuracy is a strong predictor for the portion of all error types. In particular, we observe that the portion of severe errors drops significantly with top-1 accuracy indicating that, while it underreports a model's true performance, it remains a valuable performance metric.


Towards Mitigating Spurious Correlations in the Wild: A Benchmark and a more Realistic Dataset

Joshi, Siddharth, Yang, Yu, Xue, Yihao, Yang, Wenhan, Mirzasoleiman, Baharan

arXiv.org Artificial Intelligence

Deep neural networks often exploit non-predictive features that are spuriously correlated with class labels, leading to poor performance on groups of examples without such features. Despite the growing body of recent works on remedying spurious correlations, the lack of a standardized benchmark hinders reproducible evaluation and comparison of the proposed solutions. To address this, we present SpuCo, a python package with modular implementations of state-of-the-art solutions enabling easy and reproducible evaluation of current methods. Using SpuCo, we demonstrate the limitations of existing datasets and evaluation schemes in validating the learning of predictive features over spurious ones. To overcome these limitations, we propose two new vision datasets: (1) SpuCoMNIST, a synthetic dataset that enables simulating the effect of real world data properties e.g. difficulty of learning spurious feature, as well as noise in the labels and features; (2) SpuCoAnimals, a large-scale dataset curated from ImageNet that captures spurious correlations in the wild much more closely than existing datasets. These contributions highlight the shortcomings of current methods and provide a direction for future research in tackling spurious correlations. SpuCo, containing the benchmark and datasets, can be found at https://github.com/BigML-CS-UCLA/SpuCo, with detailed documentation available at https://spuco.readthedocs.io/en/latest/.