Goto

Collaborating Authors

 aaab6hicbvbns8naej3ur1q qh69lbbbu0leqmeif48t2a9oq9lsj 3azsbsboq gu8efdeqz


Robust Implicit Networks via Non-Euclidean Contractions

arXiv.org Machine Learning

Implicit neural networks, a.k.a., deep equilibrium networks, are a class of implicit-depth learning models where function evaluation is performed by solving a fixed point equation. They generalize classic feedforward models and are equivalent to infinite-depth weight-tied feedforward networks. While implicit models show improved accuracy and significant reduction in memory consumption, they can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks based upon contraction theory for the non-Euclidean norm $\ell_\infty$. Our framework includes (i) a novel condition for well-posedness based on one-sided Lipschitz constants, (ii) an average iteration for computing fixed-points, and (iii) explicit estimates on input-output Lipschitz constants. Additionally, we design a training problem with the well-posedness condition and the average iteration as constraints and, to achieve robust models, with the input-output Lipschitz constant as a regularizer. Our $\ell_\infty$ well-posedness condition leads to a larger polytopic training search space than existing conditions and our average iteration enjoys accelerated convergence. Finally, we perform several numerical experiments for function estimation and digit classification through the MNIST data set. Our numerical results demonstrate improved accuracy and robustness of the implicit models with smaller input-output Lipschitz bounds.


Group-wise Contrastive Learning for Neural Dialogue Generation

arXiv.org Artificial Intelligence

Neural dialogue response generation has gained much popularity in recent years. Maximum Likelihood Estimation (MLE) objective is widely adopted in existing dialogue model learning. However, models trained with MLE objective function are plagued by the low-diversity issue when it comes to the open-domain conversational setting. Inspired by the observation that humans not only learn from the positive signals but also benefit from correcting behaviors of undesirable actions, in this work, we introduce contrastive learning into dialogue generation, where the model explicitly perceives the difference between the well-chosen positive and negative utterances. Specifically, we employ a pretrained baseline model as a reference. During contrastive learning, the target dialogue model is trained to give higher conditional probabilities for the positive samples, and lower conditional probabilities for those negative samples, compared to the reference model. To manage the multi-mapping relations prevailed in human conversation, we augment contrastive dialogue learning with group-wise dual sampling. Extensive experimental results show that the proposed group-wise contrastive learning framework is suited for training a wide range of neural dialogue generation models with very favorable performance over the baseline training approaches.


DVE: Dynamic Variational Embeddings with Applications in Recommender Systems

arXiv.org Machine Learning

Embedding is a useful technique to project a high-dimensional feature into a low-dimensional space, and it has many successful applications including link prediction, node classification and natural language processing. Current approaches mainly focus on static data, which usually lead to unsatisfactory performance in applications involving large changes over time. How to dynamically characterize the variation of the embedded features is still largely unexplored. In this paper, we introduce a dynamic variational embedding (DVE) approach for sequence-aware data based on recent advances in recurrent neural networks. DVE can model the node's intrinsic nature and temporal variation explicitly and simultaneously, which are crucial for exploration. We further apply DVE to sequence-aware recommender systems, and develop an end-to-end neural architecture for link prediction.


Transfer Reinforcement Learning under Unobserved Contextual Information

arXiv.org Artificial Intelligence

In this paper, we study a transfer reinforcement learning problem where the state transitions and rewards are affected by the environmental context. Specifically, we consider a demonstrator agent that has access to a context-aware policy and can generate transition and reward data based on that policy. These data constitute the experience of the demonstrator. Then, the goal is to transfer this experience, excluding the underlying contextual information, to a learner agent that does not have access to the environmental context, so that they can learn a control policy using fewer samples. It is well known that, disregarding the causal effect of the contextual information, can introduce bias in the transition and reward models estimated by the learner, resulting in a learned suboptimal policy. To address this challenge, in this paper, we develop a method to obtain causal bounds on the transition and reward functions using the demonstrator's data, which we then use to obtain causal bounds on the value functions. Using these value function bounds, we propose new Q learning and UCB-Q learning algorithms that converge to the true value function without bias. We provide numerical experiments for robot motion planning problems that validate the proposed value function bounds and demonstrate that the proposed algorithms can effectively make use of the data from the demonstrator to accelerate the learning process of the learner.


Embedding Graph Auto-Encoder with Joint Clustering via Adjacency Sharing

arXiv.org Machine Learning

Graph convolution networks have attracted many attentions and several graph auto-encoder based clustering models are developed for attributed graph clustering. However, most existing approaches separate clustering and optimization of graph auto-encoder into two individual steps. In this paper, we propose a graph convolution network based clustering model, namely, Embedding Graph Auto-Encoder with JOint Clustering via Adjacency Sharing (\textit{EGAE-JOCAS}). As for the embedded model, we develop a novel joint clustering method, which combines relaxed k-means and spectral clustering and is applicable for the learned embedding. The proposed joint clustering shares the same adjacency within graph convolution layers. Two parts are optimized simultaneously through performing SGD and taking close-form solutions alternatively to ensure a rapid convergence. Moreover, our model is free to incorporate any mechanisms (e.g., attention) into graph auto-encoder. Extensive experiments are conducted to prove the superiority of EGAE-JOCAS. Sufficient theoretical analyses are provided to support the results.


Twin Auxiliary Classifiers GAN

arXiv.org Machine Learning

Conditional generative models enjoy remarkable progress over the past few years. One of the popular conditional models is Auxiliary Classifier GAN (AC-GAN), which generates highly discriminative images by extending the loss function of GAN with an auxiliary classifier. However, the diversity of the generated samples by AC-GAN tends to decrease as the number of classes increases, hence limiting its power on large-scale data. In this paper, we identify the source of the low diversity issue theoretically and propose a practical solution to solve the problem. We show that the auxiliary classifier in AC-GAN imposes perfect separability, which is disadvantageous when the supports of the class distributions have significant overlap. To address the issue, we propose Twin Auxiliary Classifiers Generative Adversarial Net (TAC-GAN) that further benefits from a new player that interacts with other players (the generator and the discriminator) in GAN. Theoretically, we demonstrate that TAC-GAN can effectively minimize the divergence between the generated and real-data distributions. Extensive experimental results show that our TAC-GAN can successfully replicate the true data distributions on simulated data, and significantly improves the diversity of class-conditional image generation on real datasets.


Word-order biases in deep-agent emergent communication

arXiv.org Artificial Intelligence

Sequence-processing neural networks led to remarkable progress on many NLP tasks. As a consequence, there has been increasing interest in understanding to what extent they process language as humans do. We aim here to uncover which biases such models display with respect to "natural" word-order constraints. We train models to communicate about paths in a simple gridworld, using miniature languages that reflect or violate various natural language trends, such as the tendency to avoid redundancy or to minimize long-distance dependencies. We study how the controlled characteristics of our miniature languages affect individual learning and their stability across multiple network generations. The results draw a mixed picture. On the one hand, neural networks show a strong tendency to avoid long-distance dependencies. On the other hand, there is no clear preference for the efficient, non-redundant encoding of information that is widely attested in natural language. We thus suggest inoculating a notion of "effort" into neural networks, as a possible way to make their linguistic behavior more human-like.


An investigation of model-free planning

arXiv.org Machine Learning

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.


Masking: A New Perspective of Noisy Supervision

Neural Information Processing Systems

It is important to learn various types of classifiers given training data with noisy labels. Noisy labels, in the most popular noise model hitherto, are corrupted from ground-truth labels by an unknown noise transition matrix. Thus, by estimating this matrix, classifiers can escape from overfitting those noisy labels. However, such estimation is practically difficult, due to either the indirect nature of two-step approaches, or not big enough data to afford end-to-end approaches. In this paper, we propose a human-assisted approach called ''Masking'' that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix. To this end, we derive a structure-aware probabilistic model incorporating a structure prior, and solve the challenges from structure extraction and structure alignment. Thanks to Masking, we only estimate unmasked noise transition probabilities and the burden of estimation is tremendously reduced. We conduct extensive experiments on CIFAR-10 and CIFAR-100 with three noise structures as well as the industrial-level Clothing1M with agnostic noise structure, and the results show that Masking can improve the robustness of classifiers significantly.


Learning to Optimize Tensor Programs

Neural Information Processing Systems

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution are key enablers of effective deep learning systems. However, existing systems rely on manually optimized libraries such as cuDNN where only a narrow range of server class GPUs are well-supported. The reliance on hardware specific operator libraries limits the applicability of high-level graph optimizations and incurs significant engineering costs when deploying to new hardware targets. We use learning to remove this engineering burden. We learn domain specific statistical cost models to guide the search of tensor operator implementations over billions of possible program variants. We further accelerate the search by effective model transfer across workloads. Experimental results show that our framework delivers performance competitive with state-of-the-art hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPU.