Well File:


AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties

Neural Information Processing Systems

Anomaly detection is widely used for identifying critical errors and suspicious behaviors, but current methods lack interpretability. We leverage common properties of existing methods and recent advances in generative models to introduce counterfactual explanations for anomaly detection. Given an input, we generate its counterfactual as a diffusion-based repair that shows what a non-anomalous version should have looked like. A key advantage of this approach is that it enables a domain-independent formal specification of explainability desiderata, offering a unified framework for generating and evaluating explanations. We demonstrate the effectiveness of our anomaly explainability framework, AR-Pro, on vision (MVTec, VisA) and time-series (SWaT, WADI, HAI) anomaly datasets. The code used for the experiments is accessible at: https://github.com/xjiae/arpro.


Ghost kitchen delivery drivers have overrun an Echo Park neighborhood, say frustrated residents

Los Angeles Times

As soon as Echo Park Eats opened on the corner of Sunset Boulevard and Douglas Street in the fall of 2023, Sandy Romero said her neighborhood became overrun with delivery drivers. "The first day that they opened business it was chaotic, unorganized and it's just such a nuisance now," she said. Echo Park Eats is a ghost kitchen, a meal preparation hub for app-based delivery orders. It rents its kitchens to 26 different food vendors. The facility is part of CloudKitchens, led by Travis Kalanick, co-founder of Uber Technologies, which has kitchen locations across the nation including 11 in Los Angeles County.


Supplementary Materials for: Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks

Neural Information Processing Systems

According to the Figure 6 of the main manuscript, in the LIF model, the intra-neuron dependencies is caused by the firing-and-resetting mechanism. The experimented SNNs are based on the LIF model described in (4) of the main manuscript. The simulation step size is set to 1 ms. Only a few time steps are used to demonstrate low-latency spiking neural computation. The parameters like thresholds and learning rates are empirically tuned.


Mixture of Link Predictors on Graphs

Neural Information Processing Systems

Link prediction, which aims to forecast unseen connections in graphs, is a fundamental task in graph machine learning. Heuristic methods, leveraging a range of different pairwise measures such as common neighbors and shortest paths, often rival the performance of vanilla Graph Neural Networks (GNNs). Therefore, recent advancements in GNNs for link prediction (GNN4LP) have primarily focused on integrating one or a few types of pairwise information. In this work, we reveal that different node pairs within the same dataset necessitate varied pairwise information for accurate prediction and models that only apply the same pairwise information uniformly could achieve suboptimal performance. As a result, we propose a simple mixture of experts model Link-MoE for link prediction. Link-MoE utilizes various GNNs as experts and strategically selects the appropriate expert for each node pair based on various types of pairwise information. Experimental results across diverse real-world datasets demonstrate substantial performance improvement from Link-MoE. Notably, Link-MoE achieves a relative improvement of 18.71% on the MRR metric for the Pubmed dataset and 9.59% on the Hits@100 metric for the ogbl-ppa dataset, compared to the best baselines. The code is available at https://github.com/ml-ml/Link-MoE/.


Rectifying the Shortcut Learning of Background for Few-Shot Learning

Neural Information Processing Systems

The category gap between training and evaluation has been characterised as one of the main obstacles to the success of Few-Shot Learning (FSL). In this paper, we for the first time empirically identify image background, common in realistic images, as a shortcut knowledge helpful for in-class classification but ungeneralizable beyond training categories in FSL. A novel framework, COSOC, is designed to tackle this problem by extracting foreground objects in images at both training and evaluation without any extra supervision. Extensive experiments carried on inductive FSL tasks demonstrate the effectiveness of our approaches.


Supplementary Material A Proofs & Derivations 15 A.1 Finite-and Infinite-Horizon Variational Objectives

Neural Information Processing Systems

To solve the variational problem in Equation (A.5), we can define a factorized variational family Returning to the variational problem in Equation (A.5), we can now write D Following Haarnoja et al. [16], we define the variational distribution over next states to be the the true transition dynamics, that is, q Corollary 1 (Fixed-Time Outcome-Driven Reward Function). Parts of this proof are adapted from the proof given in Haarnoja et al. [16], modified for the Bellman operator proposed in Definition 1. 1. Outcome-Driven Policy Evaluation (ODPE): Instead of absorbing the entropy term into the Q-function, we can define an entropy-augmented reward as r The first condition is true by assumption. Therefore, we apply convergence results for policy evaluation with transition-dependent discount factors [53] to this contraction mapping, and the result immediately follows. Convergence follows from Outcome-Driven Policy Evaluation above. Remark 2. The convergence proof of ODPE assumes a transition-dependent discount factor [53], because the variational distribution used in Equation (11) depends on the next state and action as well as on the desired outcome.


EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning

Neural Information Processing Systems

Figure 2: The figure presents still images extracted from 360 videos used in the experiment to display various environments to the participants. The videos were selected from the publically available 360 VR video dataset (Li et al. (2017).) The EEVR dataset comprises synchronized pairs of physiological signals and textual data. It includes responses to four self-assessment questions regarding perceived arousal, valence, dominance, and discrete emotions ratings collected using PANAS questionnaires (which were further utilized to calculate Positive and Negative Affect Score). The EEVR dataset was collected using Virtual Reality (VR) 360 videos as the elicitation medium. The videos utilized in the dataset were selected based on their arousal and valence ratings to cover all four quadrants of the Russell circumplex emotion model (Russell et al. (1989)), as shown in Figure 2. The remainder of the supplementary materials provide detailed information about the EEVR dataset. Figure 3 provides a datasheet for the EEVR dataset based on Gebru et al. (2018).


EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning

Neural Information Processing Systems

EEVR (Emotion Elicitation in Virtual Reality) is a novel dataset specifically designed for language supervision-based pre-training of emotion recognition tasks, such as valence and arousal classification. It features high-quality physiological signals, including electrodermal activity (EDA) and photoplethysmography (PPG), acquired through emotion elicitation via 360-degree virtual reality (VR) videos. Additionally, it includes subject-wise textual descriptions of emotions experienced during each stimulus gathered from qualitative interviews. The dataset consists of recordings from 37 participants and is the first dataset to pair raw text with physiological signals, providing additional contextual information that objective labels cannot offer. To leverage this dataset, we introduced the Contrastive Language Signal Pre-training (CLSP) method, which jointly learns representations using pairs of physiological signals and textual descriptions. Our results show that integrating self-reported textual descriptions with physiological signals significantly improves performance on emotion recognition tasks, such as arousal and valence classification. Moreover, our pre-trained CLSP model demonstrates strong zero-shot transferability to existing datasets, outperforming supervised baseline models, suggesting that the representations learned by our method are more contextualized and generalized. The dataset also includes baseline models for arousal, valence, and emotion classification, as well as code for data cleaning and feature extraction.


MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction

Neural Information Processing Systems

Gaze following and social gaze prediction are fundamental tasks providing insights into human communication behaviors, intent, and social interactions. Most previous approaches addressed these tasks separately, either by designing highly specialized social gaze models that do not generalize to other social gaze tasks or by considering social gaze inference as an ad-hoc post-processing of the gaze following task. Furthermore, the vast majority of gaze following approaches have proposed models that can handle only one person at a time and are static, therefore failing to take advantage of social interactions and temporal dynamics. In this paper, we address these limitations and introduce a novel framework to jointly predict the gaze target and social gaze label for all people in the scene. It comprises (i) a temporal, transformer-based architecture that, in addition to frame tokens, handles personspecific tokens capturing the gaze information related to each individual; (ii) a new dataset, VSGaze, built from multiple gaze following and social gaze datasets by extending and validating head detections and tracks, and unifying annotation types. We demonstrate that our model can address and benefit from training on all tasks jointly, achieving state-of-the-art results for multi-person gaze following and social gaze prediction. Our annotations and code will be made publicly available.