Goto

Collaborating Authors

 Industry


Meta Guidance: Incorporating Inductive Biases into Deep Time Series Imputers

Neural Information Processing Systems

Missing values, frequently encountered in time series data, can significantly impair the effectiveness of analytical methods. While deep imputation models have emerged as the predominant approach due to their superior performance, explicitly incorporating inductive biases aligned with time-series characteristics offers substantial improvement potential. Taking advantage of non-stationarity and periodicity in time series, two domain-specific inductive biases are designed: (1) Non-Stationary Guidance, which operationalizes the proximity principle to address highly non-stationary series by emphasizing temporal neighbors, and (2) Periodic Guidance, which exploits periodicity patterns through learnable weight allocation across historical periods. Building upon these complementary mechanisms, the overall module, named Meta Guidance, dynamically fuses both guidances through data-adaptive weights learned from the specific input sample. Experiments on nine benchmark datasets demonstrate that integrating Meta Guidance into existing deep imputation architectures achieves an average 27.39% reduction in imputation error compared to state-of-the-art baselines.


AImplies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Neural Information Processing Systems

Due to the size and complexity of modern large language models (LLMs), it has proven challenging to uncover the underlying mechanisms that models use to solve reasoning problems. For instance, is their reasoning for a specific problem localized to certain parts of the network? Do they break down the reasoning problem into modular components that are then executed as sequential steps as we go deeper in the model? To better understand the reasoning capability of LLMs, we study a minimal propositional logic problem that requires combining multiple facts to arrive at a solution. By studying this problem on Mistral and Gemma models, up to 27B parameters, we illuminate the core components the models use to solve such logic problems. From a mechanistic interpretability point of view, we use causal mediation analysis to uncover the pathways and components of the LLMs' reasoning processes. Then, we offer fine-grained insights into the functions of attention heads in different layers. We not only find a sparse circuit that computes the answer, but we decompose it into sub-circuits that have four distinct and modular uses. Finally, we reveal that three distinct models - Mistral-7B, Gemma2-9B and Gemma-2-27B - contain analogous but not identical mechanisms.


Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling

Neural Information Processing Systems

A key challenge is integrating these modalities of different shapes while maintaining SE(3) equivariance for 3D coordinates. To achieve this, existing approaches typically maintain separate latent spaces for invariant and equivariant modalities, reducing efficiency in both training and sampling. In this work, we propose Unified Variational Auto-Encoder for 3DMolecular Latent Diffusion Modeling (UAE-3D), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error. This unified latent space eliminates the complexities of handling multi-modality and equivariance when performing latent diffusion modeling. We demonstrate this by employing the Diffusion Transformer-a general-purpose diffusion model without any molecular inductive bias-for latent generation. Extensive experiments on GEOM-Drugs and QM9 datasets demonstrate that our method significantly establishes new benchmarks in both de novo and conditional 3D molecule generation, achieving leading efficiency and quality. On GEOM-Drugs, it reduces FCD by 72.6% over the previous best result, while achieving over 70% relative average improvements in geometric fidelity. Our code is released at https://github.com/lyc0930/UAE-3D/.


Robust Explanations of Graph Neural Networks via Graph Curvatures

Neural Information Processing Systems

Explaining graph neural networks (GNNs) is a key approach to improve the trustworthiness of GNN in high-stakes applications, such as finance and healthcare. However, existing methods are vulnerable to perturbations, raising concerns about explanation reliability. Prior methods enhance explanation robustness using model retraining or explanation ensemble, with certain weaknesses. Retraining leads to models that are different from the original target model and misleading explanations, while ensemble can produce contradictory results due to different inputs or models. To improve explanation robustness without the above weaknesses, we take an unexplored route and exploit the two edge geometry properties curvature and resistance to enhance explanation robustness. We are the first to prove that these geometric notions can be used to bound explanation robustness. We design a general optimization algorithm to incorporate these geometric properties into a wide spectrum of base GNN explanation methods to enhance the robustness of base explanations. We empirically show that our method outperforms six base explanation methods in robustness across nine datasets spanning node classification, link prediction, and graph classification tasks, improving fidelity in 80% of the cases and achieving up to a 10% relative improvement in robust performance.


MMCSBench: AFine-Grained Benchmark for Large Vision-Language Models in Camouflage Scenes

Neural Information Processing Systems

Current camouflaged object detection methods predominantly follow discriminative segmentation paradigms and heavily rely on predefined categories present in the training data, limiting their generalization to unseen or emerging camouflage objects. This limitation is further compounded by the labor-intensive and time-consuming nature of collecting camouflage imagery. Although Large VisionLanguage Models (LVLMs) show potential to improve such issues with their powerful generative capabilities, their understanding of camouflage scenes is still insufficient. To bridge this gap, we introduce MMCSBench, the first comprehensive multimodal benchmark designed to evaluate and advance LVLM capabilities in camouflage scenes. MMCSBench comprises 22,537 images and 76,843 corresponding image-text pairs across five fine-grained camouflage tasks. Additionally, we propose a new task, Camouflage Efficacy Assessment (CEA), aimed at quantitatively evaluating the camouflage effectiveness of objects in images and enabling automated collection of camouflage images from large-scale databases. Extensive experiments on 26 LVLMs reveal significant shortcomings in models' ability to perceive and interpret camouflage scenes. These findings highlight the fundamental differences between natural and camouflaged visual inputs, offering insights for future research in advancing LVLM capabilities within this challenging domain.


ฯต-Seg: Sparsely Supervised Semantic Segmentation of Microscopy Data

Neural Information Processing Systems

Semantic segmentation of electron microscopy (EM) images of biological samples remains a challenge in the life sciences. EM data captures details of biological structures, sometimes with such complexity that even human observers can find it overwhelming. We introduce ฯต-Seg, a method based on hierarchical variational autoencoders (HVAES), employing center-region masking, sparse label contrastive learning (CL), a Gaussian mixture model (GMM) prior, and clustering-free label prediction. Center-region masking and the inpainting loss encourage the model to learn robust and representative embeddings to distinguish the desired classes, even if training labels are sparse (0.05% of the total image data or less). For optimal performance, we employ CL and a GMM prior to shape the latent space of the HVAE such that encoded input patches tend to cluster w.r.t. the semantic classes we wish to distinguish. Finally, instead of clustering latent embeddings for semantic segmentation, we propose a MLP semantic segmentation head to directly predict class labels from latent embeddings. We show empirical results of ฯต-Seg and baseline methods on 2dense EM datasets of biological tissues and demonstrate the applicability of our method also on fluorescence microscopy data. Our results show that ฯต-Seg is capable of achieving competitive sparsely-supervised segmentation results on complex biological image data, even if only limited amounts of training labels are available.


Diffusion Transformers as Open-World Spatiotemporal Foundation Models

Neural Information Processing Systems

The urban environment is characterized by complex spatio-temporal dynamics arising from diverse human activities and interactions. Effectively modeling these dynamics is essential for understanding and optimizing urban systems. In this work, we introduce UrbanDiT, a foundation model for open-world urban spatiotemporal learning that successfully scales up diffusion transformers in this field.


Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference

Neural Information Processing Systems

Causal inference is essential for developing and evaluating medical interventions, yet real-world medical datasets are often difficult to access due to regulatory barriers. This makes synthetic data a potentially valuable asset that enables these medical analyses, along with the development of new inference methods themselves. Generative models can produce synthetic data that closely approximate real data distributions, yet existing methods do not consider the unique challenges that downstream causal inference tasks, and specifically those focused on treatments, pose. We establish a set of desiderata that synthetic data containing treatments should satisfy to maximise downstream utility: preservation of (i) the covariate distribution, (ii) the treatment assignment mechanism, and (iii) the outcome generation mechanism. Based on these desiderata, we propose a set of evaluation metrics to assess such synthetic data. Finally, we present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine that mimics the data-generating process of data containing treatments and optimises for our desiderata. We empirically demonstrate that STEAM achieves state-of-the-art performance across our metrics as compared to existing generative models, particularly as the complexity of the true data-generating process increases.


Elon Musk's Trillion-Dollar Week Turned Out to Be Something Much Darker

Slate

His fortunes reached new heights while his online behavior reached new lows. Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Nitish_Pahwa newsletter. You can manage your newsletter subscriptions at any time.


Broken Tokens Your Language Model can Secretly Handle Non Ca cal

Neural Information Processing Systems

Modern tokenizers employ deterministic algorithms to map text into a single "canonical" token sequence, yet the same string can be encoded as many noncanonical tokenizations using the tokenizer vocabulary. In this work, we investigate the robustness of LMs to text encoded with non-canonical tokenizations entirely unseen during training. Surprisingly, when evaluated across 20 benchmarks, we find that instruction-tuned models retain up to 93.4% of their original performance when given a randomly sampled tokenization, and 90.8% with character-level tokenization. We see that overall stronger models tend to be more robust, and robustness diminishes as the tokenization departs farther from the canonical form. Motivated by these results, we then identify settings where non-canonical tokenization schemes can improve performance, finding that character-level segmentation improves string manipulation and code understanding tasks by up to +14%, and right-aligned digit grouping enhances large-number arithmetic by +33%. Finally, we investigate the source of this robustness, finding that it arises in the instructiontuning phase. We show that while both base and post-trained models grasp the semantics of non-canonical tokenizations (perceiving them as containing misspellings), base models try to mimic the imagined mistakes and degenerate into nonsensical output, while post-trained models are committed to fluent responses. Overall, our findings suggest that models are less tied to their tokenizer than previously believed, and demonstrate the promise of intervening on tokenization at inference time to boost performance.1