Well File:

On the Scalability of GNNs for Molecular Graphs

Neural Information Processing Systems

Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs for supervised pretraining.





Distributed Saddle-Point Problems Under Similarity

Neural Information Processing Systems

The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving the SPP. We show that a given suboptimality > 0 is achieved over master/workers networks in /ยต log(1/") rounds of communications, where >0 measures the degree of similarity of the local functions, ยต is their strong convexity constant, and is the diameter of the network. The lower communication complexity bound over mesh networks reads 1/ p /ยต log(1/"), where is the (normalized) eigengap of the gossip matrix used for the communication between neighbouring nodes. We then propose algorithms matching the lower bounds over either types of networks (up to log-factors). We assess the effectiveness of the proposed algorithms on a robust regression problem.



RETR: Multi-View Radar Detection Transformer for Indoor Perception, Pu (Perry) Wang

Neural Information Processing Systems

Indoor radar perception has seen rising interest due to affordable costs driven by emerging automotive imaging radar developments and the benefits of reduced privacy concerns and reliability under hazardous conditions (e.g., fire and smoke). However, existing radar perception pipelines fail to account for distinctive characteristics of the multi-view radar setting. In this paper, we propose Radar dEtection TRansformer (RETR), an extension of the popular DETR architecture, tailored for multi-view radar perception. RETR inherits the advantages of DETR, eliminating the need for hand-crafted components for object detection and segmentation in the image plane. More importantly, RETR incorporates carefully designed modifications such as 1) depth-prioritized feature similarity via a tunable positional encoding (TPE); 2) a tri-plane loss from both radar and camera coordinates; and 3) a learnable radar-to-camera transformation via reparameterization, to account for the unique multi-view radar setting. Evaluated on two indoor radar perception datasets, our approach outperforms existing stateof-the-art methods by a margin of 15.38+ AP for object detection and 11.91+ IoU for instance segmentation, respectively.


A Diffusion Noise Schedule

Neural Information Processing Systems

We find that standard noise schedules for continuous diffusions are not robust for text data. We hypothesize that the discrete nature of text and the rounding step make the model insensitive to noise near t =0. Concretely, adding small amount of Gaussian noise to a word embedding is unlikely to change its nearest neighbor in the embedding space, making denoising an easy task near t =0. Then sqrt slows down injecting noise to avoid spending much steps in the high-noise problems, which may be too difficult to solve well. The hyperparameters that are specific to Diffusion-LM include the number of diffusion steps, the architecture of the Diffusion-LM, the embedding dimension, and the noise schedule,. We set the diffusion steps to be 2000, the architecture to be BERT-base [7], and the sequence length to be 64. For the embedding dimensions, we select from d 2{16, 64, 128, 256} and select d = 16 for the E2E dataset and d = 128 for ROCStories. For the noise schedule, we design the sqrt schedule (Appendix A) that is more robust to different parametrizations and embedding dimensions as shown in Appendix M. We train Diffusion-LMs using AdamW optimizer and a linearly decay learning rate starting at 1e-4, dropout of 0.1, batch size of 64, and the total number of training iteration is 200K for E2E dataset, and 800K for ROCStories dataset. It takes approximately 5 hours to train for 200K iterations on a single A100 GPU. To achieve controllable generation, we run gradient update on the continuous latents of Diffusion-LM. We use the AdaGrad optimizer [10] to update the latent variables, and we tune the learning rate, lr 2{0.05, 0.1, 0.15, 0.2} and the trade-off parameter 2{0.1,0.01, Different plug-and-play controllable generation approaches tradeoff between fluency and control by tunning different hyperparameters: PPLM uses the number of gradient updates per token, denoted as k, and we tune k 2{10, 30}.



Complete Graphical Criterion for Sequential Covariate Adjustment in Causal Inference Yonghan Jung Min Woo Park 2 Sanghack Lee Purdue University

Neural Information Processing Systems

Covariate adjustment, also known as back-door adjustment, is a fundamental tool in causal inference. Although a sound and complete graphical identification criterion, known as adjustment criterion (Shpitser et al., 2010), exists for static contexts, sequential contexts present challenges. Current practices, such as the sequential back-door adjustment (Pearl and Robins, 1995) or multi-outcome sequential backdoor adjustment (Jung et al., 2020), are sound but incomplete; i.e., there are graphical scenarios where the causal effect is expressible via covariate adjustment, yet these criteria do not cover. In this paper, we exemplify this incompleteness and then present the sequential adjustment criterion, a sound and complete criterion for sequential covariate adjustment. We provide a constructive sequential adjustment criterion that identifies a set that satisfies the sequential adjustment criterion if and only if the causal effect can be expressed as a sequential covariate adjustment. Finally, we present an algorithm for identifying a minimal sequential covariate adjustment set, which optimizes efficiency by ensuring that no unnecessary vertices are included.