Goto

Collaborating Authors

 Performance Analysis


License of the assets

Neural Information Processing Systems

Licence for the codes We use the code for MS-TCN [13], ASRF [24], LAS [9], all of which are under MITLicense according to https://opensource.org/licenses/MIT. For the Jigsaws [18] dataset, we follow the data use agreeement according to https://cs.jhu. Action classification: Action classification is the task of identifying a single action, as opposed to a sequence of actions. Several methods use 2DCNNs to extract frame-wise features from an input video, which are then combined to predict a coarse action taking place in the video [56, 39, 59]. There also exist several works that perform action classification from kinematic data [2, 12]. Action segmentation: Action segmentation is the problem of segmenting an input stream of data, labeling each frame according to the action that is being carried out. Earlier methods for action segmentation employed hidden Markov models [33, 22]. More recently, convolutional neural networks [58, 26] and recurrent neural networks [50] have been applied to this problem Inspired by the success of temporal convolutional networks (TCNs) in speech synthesis, [37] adapted these models to action segmentation. MS-TCN [13], which uses a multi-stage TCN architecture, has become one of the most widely used architecture for action segmentation. Although these methods achieve high frame-wise accuracy, they still produce a significant number of over-segmentation errors. In order to address this, several boundary-aware methods have been developed which perform temporal smoothing of the frame-wise predictions [57, 24]. These methods use ground-truth boundary information to train a binary classification network to perform boundary detection. The boundary estimates are then used to aggregate the frame-wise predictions either in a soft manner (boundary-aware pooling) or by setting a hard threshold. However, for elemental actions with a short duration, such as the functional primitives in the StrokeRehab dataset, the duration of each action is very short. As a result, the boundaries between actions can be hard to detect or even hard to define (see Figure 4). Sequence-to-sequence models: Our proposed method is based on sequence-to-sequence (seq2seq) models. These models allow us to learn a mapping of a variable-length input sequence to a variablelength output sequence [53].


Towards Unbounded Machine Unlearning

Neural Information Processing Systems

Deep machine unlearning is the problem of'removing' from a trained neural network a subset of its training set. This problem is very timely and has many applications, including the key tasks of removing biases (RB), resolving confusion (RC) (caused by mislabelled data in trained models), as well as allowing users to exercise their'right to be forgotten' to protect User Privacy (UP). This paper is the first, to our knowledge, to study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for'forgetting' and associated metrics for forget quality. For UP, we propose a novel adaptation of a strong Membership Inference Attack for unlearning. We also propose SCRUB, a novel unlearning algorithm, which is the only method that is consistently a top performer for forget quality across the different application-dependent metrics for RB, RC, and UP. At the same time, SCRUB is also consistently a top performer on metrics that measure model utility (i.e.


RAAGBl Wh25-3535-5050-6565-80Acc 2 s21 s63 s74 s54 s298 s685 s660 s40% 0๏ฟฝmpaaaacmpmpmpmpiaaaECEtkmpmpmpsleeEtllllseeeeilllmsssseeesss ate MAE vs Oracle

Neural Information Processing Systems

Evaluating the performance of machine learning models on diverse and underrepresented subgroups is essential for ensuring fairness and reliability in real-world applications. However, accurately assessing model performance becomes challenging due to two main issues: (1) a scarcity of test data, especially for small subgroups, and (2) possible distributional shifts in the model's deployment setting, which may not align with the available test data. In this work, we introduce 3STesting, a deep generative modeling framework to facilitate model evaluation by generating synthetic test sets for small subgroups and simulating distributional shifts. Our experiments demonstrate that 3STesting outperforms traditional baselines--including real test data alone--in estimating model performance on minority subgroups and under plausible distributional shifts. In addition, 3S offers intervals around its performance estimates, exhibiting superior coverage of the ground truth compared to existing approaches. Overall, these results raise the question of whether we need a paradigm shift away from limited real test data towards synthetic test data.


What You See is What You Classify: Black Box Attributions

Neural Information Processing Systems

An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We instead address this challenge by training a second deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. These attributions are provided in the form of masks that only show the classifier-relevant parts of an image, masking out the rest. Our approach produces sharper and more boundaryprecise masks when compared to the saliency maps generated by other methods. Moreover, unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks in a single forward pass. This makes the proposed method very efficient during inference. We show that our attributions are superior to established methods both visually and quantitatively with respect to the PASCAL VOC-2007 and Microsoft COCO-2014 datasets.


Scalable Membership Inference Attacks via Quantile Regression

Neural Information Processing Systems

Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many shadow models--i.e.


Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Neural Information Processing Systems

We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings.


Calibeating Prediction-Powered Inference

arXiv.org Machine Learning

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability weighting (AIPW) [Robins et al., 1994], which protects against prediction-model misspecification but can be inefficient when the prediction score is poorly aligned with the outcome scale. We introduce Calibrated Prediction-Powered Inference, which post-hoc calibrates the prediction score on the labeled sample before using it for semisupervised estimation. This simple step requires no retraining and can improve the original score both as a predictor of the outcome and as a regression adjustment for semisupervised inference. We study both linear and isotonic calibration. For isotonic calibration, we establish first-order optimality guarantees: isotonic post-processing can improve predictive accuracy and estimator efficiency relative to the original score and simpler post-processing rules, while no further post-processing of the fitted isotonic score yields additional first-order gains. For linear calibration, we show first-order equivalence to PPI++. We also clarify the relationship among existing estimators, showing that the original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is accurate, while PPI++ is AIPW with empirical efficiency maximization [Rubin et al., 2008]. In simulations and real-data experiments, our calibrated estimators often outperform PPI and are competitive with, or outperform, AIPW and PPI++. We provide an accompanying Python package, ppi_aipw, at https://larsvanderlaan.github.io/ppi-aipw/.




The Pope's Warnings About AI Were AI-Generated, a Detection Tool Claims

WIRED

The Pope's Warnings About AI Were AI-Generated, a Detection Tool Claims Pangram Labs' updated Chrome extension puts warning labels on AI slop as you scroll your social feeds. On Monday, a brand-new Reddit account popped up on the widely read forum r/AmItheAsshole, where users have their personal disputes arbitrated by strangers. This particular user asked if they had crossed a line by "refusing to babysit my stepmother's kids because I have my own job and responsibilities." The post itself was succinct, straightforward, and grammatically clean, explaining a situation in which the person's stepmother and father often expected them to provide childcare on little notice, eventually leading to an argument. "Now there's tension at home, and I'm starting to wonder if I handled it the wrong way," the redditor concluded.