Goto

Collaborating Authors

 lime



Insights into Pre-training via Simpler Synthetic Tasks

Neural Information Processing Systems

Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. We find the best synthetic pre-training method, LIME, attains an average of $67\%$ of the benefits of natural pre-training. Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the set function achieves $65\%$ of the benefits, almost matching LIME. Third, we find that $39\%$ of the benefits can be attained by using merely the parameter statistics of synthetic pre-training.


Improving Local Fidelity Through Sampling and Modeling Nonlinearity

Shrestha, Sanjeev, Dubey, Rahul, Liu, Hui

arXiv.org Artificial Intelligence

With the increasing complexity of black-box machine learning models and their adoption in high-stakes areas, it is critical to provide explanations for their predictions. Local Interpretable Model-agnostic Explanation (LIME) is a widely used technique that explains the prediction of any classifier by learning an interpretable model locally around the predicted instance. However, it assumes that the local decision boundary is linear and fails to capture the non-linear relationships, leading to incorrect explanations. In this paper, we propose a novel method that can generate high-fidelity explanations. Multivariate adaptive regression splines (MARS) is used to model non-linear local boundaries that effectively captures the underlying behavior of the reference model, thereby enhancing the local fidelity of the explanation. Additionally, we utilize the N-ball sampling technique, which samples directly from the desired distribution instead of reweighting samples as done in LIME, further improving the faithfulness score. We evaluate our method on three UCI datasets across different classifiers and varying kernel widths. Experimental results show that our method yields more faithful explanations compared to baselines, achieving an average reduction of 37% in root mean square error, significantly improving local fidelity.



Insights into Pre-training via Simpler Synthetic Tasks Yuhuai Wu

Neural Information Processing Systems

Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains.


770f8e448d07586afbf77bb59f698587-AuthorFeedback.pdf

Neural Information Processing Systems

Thank you for your thoughtful feedback. We will first discuss common themes and then specific reviewer comments. Even though ExpO is "simple" (in that it connects existing concepts, albeit in a novel way), we believe We will add a discussion as outlined below. " by Qin et al does not consider interpretability at all. Several methods rely on domain knowledge: "Learning credible . . .



On Thin Ice: Towards Explainable Conservation Monitoring via Attribution and Perturbations

Zhou, Jiayi, Aghakishiyeva, Günel, Arya, Saagar, Dale, Julian, Poling, James David, Houliston, Holly R., Womble, Jamie N., Larsen, Gregory D., Johnston, David W., Bent, Brinnae

arXiv.org Artificial Intelligence

Computer vision can accelerate ecological research and conservation monitoring, yet adoption in ecology lags in part because of a lack of trust in black-box neural-network-based models. We seek to address this challenge by applying post-hoc explanations to provide evidence for predictions and document limitations that are important to field deployment. Using aerial imagery from Glacier Bay National Park, we train a Faster R-CNN to detect pinnipeds (harbor seals) and generate explanations via gradient-based class activation mapping (HiResCAM, LayerCAM), local interpretable model-agnostic explanations (LIME), and perturbation-based explanations. We assess explanations along three axes relevant to field use: (i) localization fidelity: whether high-attribution regions coincide with the animal rather than background context; (ii) faithfulness: whether deletion/insertion tests produce changes in detector confidence; and (iii) diagnostic utility: whether explanations reveal systematic failure modes. Explanations concentrate on seal torsos and contours rather than surrounding ice/rock, and removal of the seals reduces detection confidence, providing model-evidence for true positives. The analysis also uncovers recurrent error sources, including confusion between seals and black ice and rocks. We translate these findings into actionable next steps for model development, including more targeted data curation and augmentation. By pairing object detection with post-hoc explainability, we can move beyond "black-box" predictions toward auditable, decision-supporting tools for conservation monitoring.



770f8e448d07586afbf77bb59f698587-AuthorFeedback.pdf

Neural Information Processing Systems

Thank you for your thoughtful feedback. We will first discuss common themes and then specific reviewer comments. Even though ExpO is "simple" (in that it connects existing concepts, albeit in a novel way), we believe We will add a discussion as outlined below. " by Qin et al does not consider interpretability at all. Several methods rely on domain knowledge: "Learning credible . . .