original paper
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > Canada (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.05)
Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization
Yang, Guangmingmei, Miller, David J., Kesidis, George
Most post-training backdoor detection methods rely on attacked models exhibiting extreme outlier detection statistics for the target class of an attack, compared to non-target classes. However, these approaches may fail: (1) when some (non-target) classes are easily discriminable from all others, in which case they may naturally achieve extreme detection statistics (e.g., decision confidence); and (2) when the backdoor is subtle, i.e., with its features weak relative to intrinsic class-discriminative features. A key observation is that the backdoor target class has contributions to its detection statistic from both the backdoor trigger and from its intrinsic features, whereas non-target classes only have contributions from their intrinsic features. To achieve more sensitive detectors, we thus propose to suppress intrinsic features while optimizing the detection statistic for a given class. For non-target classes, such suppression will drastically reduce the achievable statistic, whereas for the target class the (significant) contribution from the backdoor trigger remains. In practice, we formulate a constrained optimization problem, leveraging a small set of clean examples from a given class, and optimizing the detection statistic while orthogonalizing with respect to the class's intrinsic features. We dub this plug-and-play approach Class Subspace Orthogonalization (CSO) and assess it against challenging mixed-label and adaptive attacks.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis
Bianchi, Federico, Kwon, Yongchan, Izzo, Zachary, Zhang, Linjun, Zou, James
How many mistakes do published AI papers contain? Peer-reviewed publications form the foundation upon which new research and knowledge are built. Errors that persist in the literature can propagate unnoticed, creating confusion in follow-up studies and complicating reproducibility. The accelerating pace of research and the increasing demands on the peer-review system make such mistakes harder to detect and avoid. To address this, we developed a Paper Correctness Checker based on GPT-5 to systematically identify mistakes in papers previously published at top AI conferences and journals. Our analysis focuses on objective mistakes-e.g., errors in formulas, derivations, calculations, figures, and tables-that have a clearly verifiable ground truth. We intentionally exclude subjective considerations such as novelty, importance, or writing quality. We find that published papers contain a non-negligible number of objective mistakes and that the average number of mistakes per paper has increased over time-from 3.8 in NeurIPS 2021 to 5.9 in NeurIPS 2025 (55.3% increase); from 4.1 in ICLR 2018 to 5.2 in ICLR 2025; and from 5.0 in TMLR 2022/23 to 5.5 in TMLR 2025. Human experts reviewed 316 potential mistakes identified by the AI Checker and confirmed that 263 were actual mistakes, corresponding to a precision of 83.2%. While most identified issues are relatively minor, correcting them would reduce confusion in the literature and strengthen reproducibility. The AI Checker also surfaced potentially more substantive mistakes that could affect the interpretation of results. Moreover, we show that the AI Checker can propose correct fixes for 75.8% of the identified mistakes. Overall, this study highlights the potential of frontier LLMs to detect and correct objective mistakes in published papers, helping to establish a firmer foundation of knowledge.
Reproducibility Report: Test-Time Training on Nearest Neighbors for Large Language Models
Zhou, Boyang, Lindqvist, Johan, Li, Lindsey
We reproduce the central claims of Test-Time Training on Nearest Neighbors for Large Language Models (Hardt and Sun, 2024), which proposes adapting a language model at inference time by fine-tuning on retrieved nearest-neighbor sequences. Using pretrained RoBERTa embeddings indexed with Faiss, we retrieve 20 neighbors per test input and apply one gradient update per neighbor across GPT-2 (117M, 774M), GPT-Neo (1.3B), and R1-Distilled-Qwen2.5-1.5B. Our experiments confirm that test-time training significantly reduces perplexity and bits-per-byte metrics across diverse domains from The Pile, with the largest improvements in structured or specialized datasets such as GitHub and EuroParl. We further validate that models not pretrained on The Pile benefit more from this adaptation than models already trained on similar data, allowing smaller models to approach the performance of larger ones. Due to infrastructure limitations, we introduce a memory-efficient retrieval implementation that loads only required line offsets rather than entire files, reducing RAM requirements from over 128 GB per server to 32 GB. We also extend the original study by evaluating R1-Distilled-Qwen2.5-1.5B, showing that test-time training yields consistent gains even for modern reasoning-optimized architectures. Overall, our results support the robustness and generality of nearest-neighbor test-time training while highlighting practical considerations for reproducing large-scale retrieval-augmented adaptation.
- North America > Canada > Quebec > Montreal (0.24)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- North America > Canada > Quebec > Montreal (0.24)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
reproducibility is of central importance to the whole NeurIPS community and was also unanimously identified during 2
First of all, we thank all reviewers for their valuable time and feedback. We thank the reviewers for pointing out typos and grammatical errors, which we of course have fixed now. We are afraid that the reviewer might have misunderstood some parts of the paper. We refer to the original paper for further details about the approximation of the variational posterior. We have made this more clear in the main paper now.