Goto

Collaborating Authors

 iclr 2022


Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science

Wu, Xiaohua, Tao, Xiaohui, Wu, Wenjie, Li, Yuefeng, Li, Lin

arXiv.org Artificial Intelligence

Social surveys in computational social science are well-designed by elaborate domain theories that can effectively reflect the interviewee's deep thoughts without concealing their true feelings. The candidate questionnaire options highly depend on the interviewee's previous answer, which results in the complexity of social survey analysis, the time, and the expertise required. The ability of large language models (LLMs) to perform complex reasoning is well-enhanced by prompting learning such as Chain-of-thought (CoT) but still confined to left-to-right decision-making processes or limited paths during inference. This means they can fall short in problems that require exploration and uncertainty searching. In response, a novel large language model prompting method, called Random Forest of Thoughts (RFoT), is proposed for generating uncertainty reasoning to fit the area of computational social science. The RFoT allows LLMs to perform deliberate decision-making by generating diverse thought space and randomly selecting the sub-thoughts to build the forest of thoughts. It can extend the exploration and prediction of overall performance, benefiting from the extensive research space of response. The method is applied to optimize computational social science analysis on two datasets covering a spectrum of social survey analysis problems. Our experiments show that RFoT significantly enhances language models' abilities on two novel social survey analysis problems requiring non-trivial reasoning.


ViTGAN: Training GANs with Vision Transformers

Lee, Kwonjoon, Chang, Huiwen, Jiang, Lu, Zhang, Han, Tu, Zhuowen, Liu, Ce

arXiv.org Artificial Intelligence

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce several novel regularization techniques for training GANs with ViTs. For ViT generators, we examine architectural choices for latent and pixel mapping layers to facilitate convergence. Empirically, our approach, named ViTGAN, achieves comparable performance to the leading CNNbased GAN models on three datasets: CIFAR-10, CelebA, and LSUN bedroom. Convolutional neural networks (CNNs) (LeCun et al., 1989) are dominating computer vision today, thanks to their powerful capability of convolution (weight-sharing and local-connectivity) and pooling (translation equivariance). Recently, however, Transformer architectures (Vaswani et al., 2017) have started to rival CNNs in many vision tasks. In particular, Vision Transformers (ViTs) (Dosovitskiy et al., 2021), which interpret an image as a sequence of tokens (analogous to words in natural language), have been shown to achieve comparable classification accuracy with smaller computational budgets (i.e., fewer FLOPs) on the ImageNet benchmark. Unlike CNNs, ViTs capture a different inductive bias through self-attention where each patch is attended to all patches of the same image. ViTs, along with their variants (Touvron et al., 2020; Tolstikhin et al., 2021), though still in their infancy, have demonstrated advantages in modeling non-local contextual dependencies (Ranftl et al., 2021; Strudel et al., 2021) as well as promising efficiency and scalability.


Learning to Extend Molecular Scaffolds with Structural Motifs

Maziarz, Krzysztof, Jackson-Flux, Henry, Cameron, Pashmina, Sirockin, Finton, Schneider, Nadine, Stiefl, Nikolaus, Segler, Marwin, Brockschmidt, Marc

arXiv.org Artificial Intelligence

Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored. Here, we propose MoLeR, a graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because it is not conditioned on the generation history. Our experiments show that MoLeR performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffold-based tasks, while being an order of magnitude faster to train and sample from than existing approaches. Furthermore, we show the influence of a number of seemingly minor design choices on the overall performance.


MTLight: Efficient Multi-Task Reinforcement Learning for Traffic Signal Control

Zhu, Liwen, Peng, Peixi, Lu, Zongqing, Tian, Yonghong

arXiv.org Artificial Intelligence

Traffic signal control has a great impact on alleviating traffic congestion in modern cities. Deep reinforcement learning (RL) has been widely used for this task in recent years, demonstrating promising performance but also facing many challenges such as limited performances and sample inefficiency. To handle these challenges, MTLight is proposed to enhance the agent observation with a latent state, which is learned from numerous traffic indicators. Meanwhile, multiple auxiliary and supervisory tasks are constructed to learn the latent state, and two types of embedding latent features, the task-specific feature and task-shared feature, are used to make the latent state more abundant. Extensive experiments conducted on CityFlow demonstrate that MTLight has leading convergence speed and asymptotic performance. We further simulate under peak-hour pattern in all scenarios with increasing control difficulty and the results indicate that MTLight is highly adaptable.


PF-GNN: Differentiable particle filtering based approximation of universal graph representations

Dupty, Mohammed Haroon, Dong, Yanfei, Lee, Wee Sun

arXiv.org Artificial Intelligence

Message passing Graph Neural Networks (GNNs) are known to be limited in expressive power by the 1-WL color-refinement test for graph isomorphism. Other more expressive models either are computationally expensive or need preprocessing to extract structural features from the graph. In this work, we propose to make GNNs universal by guiding the learning process with exact isomorphism solver techniques which operate on the paradigm of Individualization and Refinement (IR), a method to artificially introduce asymmetry and further refine the coloring when 1-WL stops. Isomorphism solvers generate a search tree of colorings whose leaves uniquely identify the graph. However, the tree grows exponentially large and needs hand-crafted pruning techniques which are not desirable from a learning perspective. We take a probabilistic view and approximate the search tree of colorings (i.e. embeddings) by sampling multiple paths from root to leaves of the search tree. To learn more discriminative representations, we guide the sampling process with particle filter updates, a principled approach for sequential state estimation. Our algorithm is end-to-end differentiable, can be applied with any GNN as backbone and learns richer graph representations with only linear increase in runtime. Experimental evaluation shows that our approach consistently outperforms leading GNN models on both synthetic benchmarks for isomorphism detection as well as real-world datasets.


Non-Denoising Forward-Time Diffusions

Peluchetti, Stefano

arXiv.org Machine Learning

The scope of this paper is generative modeling through diffusion processes. An approach falling within this paradigm is the work of Song et al. (2021), which relies on a time-reversal argument to construct a diffusion process targeting the desired data distribution. We show that the time-reversal argument, common to all denoising diffusion probabilistic modeling proposals, is not necessary. We obtain diffusion processes targeting the desired data distribution by taking appropriate mixtures of diffusion bridges. The resulting transport is exact by construction, allows for greater flexibility in choosing the dynamics of the underlying diffusion, and can be approximated by means of a neural network via novel training objectives. We develop a unifying view of the drift adjustments corresponding to our and to time-reversal approaches and make use of this representation to inspect the inner workings of diffusion-based generative models. Finally, we leverage on scalable simulation and inference techniques common in spatial statistics to move beyond fully factorial distributions in the underlying diffusion dynamics. The methodological advances contained in this work contribute toward establishing a general framework for generative modeling based on diffusion processes.


Optimizing Neural Networks with Gradient Lexicase Selection

Ding, Li, Spector, Lee

arXiv.org Artificial Intelligence

One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Modern data-driven learning algorithms, in general, define an optimization objective, e.g., a fitness function for parent selection in genetic algorithms (Holland, 1992) or a loss function for gradient descent in deep learning (LeCun et al., 2015), which computes the aggregate performance on the training data to guide the optimization process. Taking the image classification problem as an example, most recent approaches use Cross-Entropy loss with gradient descent (Bottou, 2010) and backpropagation (Rumelhart et al., 1985) to train deep neural networks (DNNs) on batches of training images. Despite the success that advanced DNNs can reach human-level performance on the image recognition task (Russakovsky et al., 2015), one potential drawback for such aggregated performance measurement is that the model may learn to seek "compromises" during the learning procedure, e.g., optimizing model weights to intentionally keep some errors in order to gain higher likelihood on correct predictions.


Zero-Shot Self-Supervised Learning for MRI Reconstruction

Yaman, Burhaneddin, Hosseini, Seyed Amir Hossein, Akçakaya, Mehmet

arXiv.org Artificial Intelligence

Deep learning (DL) has emerged as a powerful tool for accelerated MRI reconstruction, but often necessitates a database of fully-sampled measurements for training. Recent self-supervised and unsupervised learning approaches enable training without fully-sampled data. However, a database of undersampled measurements may not be available in many scenarios, especially for scans involving contrast or translational acquisitions in development. Moreover, recent studies show that database-trained models may not generalize well when the unseen measurements differ in terms of sampling pattern, acceleration rate, SNR, image contrast, and anatomy. Such challenges necessitate a new methodology to enable subject-specific DL MRI reconstruction without external training datasets, since it is clinically imperative to provide high-quality reconstructions that can be used to identify lesions/disease for every individual. In this work, we propose a zeroshot self-supervised learning approach to perform subject-specific accelerated DL MRI reconstruction to tackle these issues. The proposed approach partitions the available measurements from a single scan into three disjoint sets. Two of these sets are used to enforce data consistency and define loss during training for selfsupervision, while the last set serves to self-validate, establishing an early stopping criterion. In the presence of models pre-trained on a database with different image characteristics, we show that the proposed approach can be combined with transfer learning for faster convergence time and reduced computational complexity. Magnetic resonance imaging (MRI) is a non-invasive, radiation-free medical imaging modality that provides excellent soft tissue contrast for diagnostic purposes.


Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

Shaj, Vaisakh, Buchler, Dieter, Sonker, Rohit, Becker, Philipp, Neumann, Gerhard

arXiv.org Artificial Intelligence

Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.


Constraining Linear-chain CRFs to Regular Languages

Papay, Sean, Klinger, Roman, Padó, Sebastian

arXiv.org Artificial Intelligence

A major challenge in structured prediction is to represent the interdependencies within output structures. When outputs are structured as sequences, linear-chain conditional random fields (CRFs) are a widely used model class which can learn \textit{local} dependencies in the output. However, the CRF's Markov assumption makes it impossible for CRFs to represent distributions with \textit{nonlocal} dependencies, and standard CRFs are unable to respect nonlocal constraints of the data (such as global arity constraints on output labels). We present a generalization of CRFs that can enforce a broad class of constraints, including nonlocal ones, by specifying the space of possible output structures as a regular language $\mathcal{L}$. The resulting regular-constrained CRF (RegCCRF) has the same formal properties as a standard CRF, but assigns zero probability to all label sequences not in $\mathcal{L}$. Notably, RegCCRFs can incorporate their constraints during training, while related models only enforce constraints during decoding. We prove that constrained training is never worse than constrained decoding, and show empirically that it can be substantially better in practice. Additionally, we demonstrate a practical benefit on downstream tasks by incorporating a RegCCRF into a deep neural model for semantic role labeling, exceeding state-of-the-art results on a standard dataset.