Pairwise Causality Guided Transformers for Event Sequences

Neural Information Processing Systems

Although pairwise causal relations have been extensively studied in observational longitudinal analyses across many disciplines, incorporating knowledge of causal pairs into deep learning models for temporal event sequences remains largely unexplored. In this paper, we propose a novel approach for enhancing the performance of transformer-based models in multivariate event sequences by injecting pairwise qualitative causal knowledge such as'event Z amplifies future occurrences of event Y'. We establish a new framework for causal inference in temporal event sequences using a transformer architecture, providing a theoretical justification for our approach, and show how to obtain unbiased estimates of the proposed measure. Experimental results demonstrate that our approach outperforms several state-of-the-art models in terms of prediction accuracy by effectively leveraging knowledge about causal pairs. We also consider a unique application where we extract knowledge around sequences of societal events by generating them from a large language model, and demonstrate how a causal knowledge graph can help with event prediction in such sequences. Overall, our framework offers a practical means of improving the performance of transformer-based models in multivariate event sequences by explicitly exploiting pairwise causal information.



TaskMet: Task-Driven Metric Learning for Model Learning

Neural Information Processing Systems

Deep learning models are often used with some downstream task. Models solely trained to achieve accurate predictions may struggle to perform well on the desired downstream tasks. We propose using the task loss to learn a metric which parameterizes a loss to train the model. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states. The source code to reproduce our experiments is available here.


Quantifying the Gain in Weak-to-Strong Generalization

Neural Information Processing Systems

Recent advances in large language models have shown capabilities that are extraordinary and near-superhuman. These models operate with such complexity that reliably evaluating and aligning them proves challenging for humans. This leads to the natural question: can guidance from weak models (like humans) adequately direct the capabilities of strong models?


A Computation and Implementation Details We propose several optimizations in the P

Neural Information Processing Systems

Explaining Website For the website dataset, we explain product and user preferences in Figure 15. We generally found that, for the period considered, cosmetic products and the "Jersey Basic category" drove clicks.



Toward Dynamic Non-Line-of-Sight Imaging with Mamba Enforced Temporal Consistency

Neural Information Processing Systems

Dynamic reconstruction in confocal non-line-of-sight imaging encounters great challenges since the dense raster-scanning manner limits the practical frame rate. A fewer pioneer works reconstruct high-resolution volumes from the under-scanning transient measurements but overlook temporal consistency among transient frames. To fully exploit multi-frame information, we propose the first spatial-temporal Mamba (ST-Mamba) based method tailored for dynamic reconstruction of transient videos. Our method capitalizes on neighbouring transient frames to aggregate the target 3D hidden volume. Specifically, the interleaved features extracted from the input transient frames are fed to the proposed ST-Mamba blocks, which leverage the time-resolving causality in transient measurement.


LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition (Supplementary Material)

Neural Information Processing Systems

In Figure 1, we compare our LMC framework with the baseline Softmax, and present qualitative results on the TinyImageNet dataset. Note that for the baseline Softmax, we do not simulate any virtual open-set classes. As shown, via simulating additional virtual open-set classes that share the spurious-discriminative features, our framework can prevent the closed-set score S of the open-set testing image from being easily overestimated by approaching the image to both a certain closed-set class and certain virtual open-set classes. This demonstrates the effectiveness of our framework in reducing the reliance on spurious-discriminative features. In our experiments, following [1, 11], we use the following two metrics: AUROC and OSCR [3].